Re: Parsing a simple BASIC language

"Dunny" <>
12 Apr 2001 02:48:26 -0400

          From comp.compilers

Related articles
Parsing a simple BASIC language (paul.dunn4) (2001-04-04)
Re: Parsing a simple BASIC language (Barry Kelly) (2001-04-10)
Re: Parsing a simple BASIC language (2001-04-10)
Re: Parsing a simple BASIC language (2001-04-12)
Re: Parsing a simple BASIC language (Dunny) (2001-04-12)
Re: Parsing a simple BASIC language (Barry Kelly) (2001-04-14)
Re: Parsing a simple BASIC language (2001-04-18)
Re: Parsing a simple BASIC language (2001-04-18)
Re: Parsing a simple BASIC language (Dunny) (2001-04-22)
Re: Parsing a simple BASIC language (Barry Kelly) (2001-04-22)
| List of all articles for this month |

From: "Dunny" <>
Newsgroups: comp.compilers
Date: 12 Apr 2001 02:48:26 -0400
Organization: ntlworld News Service
References: 01-04-014 01-04-054
Keywords: Basic, parse
Posted-Date: 12 Apr 2001 02:48:26 EDT

<> wrote in message news:01-04-054@comp.compilers...
> On 4 Apr 2001 00:18:06 -0400, paul.dunn4 <> wrote:
> A stack might not be the most appropriate data type, since it seems
> that you do not approach the data Last In, First Out. Consider using
> some other data type, e.g. an an array.

Done. The stack still stays, but is now an array :) you're right - it's

> Of course, another question is how efficient your tokenizing
> code is.

        Very - or at least as fast as I can process it. It only reads each
character in the input stream once, and can tokenise a 500 char line in less
than a millisecond (according to windows - so I don't trust it)

> You do this before parsing? This is a strange approach, which I do not
> fully understand.

It's so I don't have to bother with checking for a valid type (like a
decimal number, a variable name [which can contain spaces and
numerals], a reserved word (stored in the tokenised array as an
integer value - you'll see why later) or a symbol such as +, -, /, *
etc) during parsing. The parser now only checks if numeric/string
expressions are correct syntax. No keyword parsing at all.

> > 3) runs the resulting (smaller) stack through the BNF parser using the
> >rule for that keyword. It is this part that performs error checking.
> Is this a hand-coded parser? Perhaps you wrote a parser that
> backtracks when confronted with something it cannot handle? Such a
> parser can be very slow, if it has to do lots of backtracking.
> Consider using a parser generator tool, or try to rewrite your parser
> so that it doesn't do any backtracking.

Used a parser generator. Overkill, No good. Tried Yacc/Lexx for TP,
bloated code. generates far too much to do a simple job. (I may be
utilising it badly though) My parser is about 150-200 lines of code,
and works like a dream. No backtracking, once I had figured out
exactly where I was backtracking. The slowest part of the parser is
getting items from the array, so I eliminated backtracking in order to
prevent a token being read more than once. A nice trick I learnt from
a C program was to pass the token in question to the expression
parser, and advance the pointer on one step. Then process the token.

> I think your parser is not efficient. You can ditch the second step
> once you have an efficient parser.

I have. I ditched the second step, and built generic routines to check
different argument types. IE, Sin(A) = B, STR$(A)=S$, Val(D$)=D etc
all have similar arguments and results. one procedure to check them
does the job. The trick was to get it to behave like a real spectrum
:) So now I have about 10 routines to check, and only four of them are
specialised routines. They all work recusively with the expression
parser, and build the syntax string as they go. So when stepping
through the token list (stack as was), if I find a value (everything
is stored as ints) greater than 1000, I use a lookup table to jump to
the appropriate routine.

It was an absolute nightmare, but it works now. Phew!

> Another idea is to only do the syntax checking in the "idle" loop of
> your application. In this way, when the user starts to type a line, it
> will not be syntax checked until the computer has time to catch up
> with the user.

Thanks for you suggestions - you obviously know more about it than I
do, but I hate to use a utility to write code for me when I could do
it myself. I won't be using a parser generator in future, unless I can
write one. Someone once asked me why I wanted to re-invent the wheel -
and I replied "Because It's *MY* wheel."

Your suggestion about the idle checking is a good one though, but the
parser is fast enough now to not be a worry. Hope no-one runs it on
slower hardware though :)

As I said, the hardest part was getting it to behave like a *real* spectrum.
Chr$ 2+2 doesn't work, but chr$(2+2) does. print a$>B$ does not work on a
128 spec, but print (a$)>(b$) does...

Thanks for your interest,

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.