Re: Visual Basic Grammar Definition wanted

Scott Stanchfield <scooter@mccabe.com>
19 Jul 1996 00:00:36 -0400

          From comp.compilers

Related articles
Visual Basic Grammar Definition wanted dgroenve@vitghrsy.telecom.com.au (1996-07-16)
Re: Visual Basic Grammar Definition wanted scooter@mccabe.com (Scott Stanchfield) (1996-07-19)
| List of all articles for this month |
From: Scott Stanchfield <scooter@mccabe.com>
Newsgroups: comp.compilers
Date: 19 Jul 1996 00:00:36 -0400
Organization: McCabe & Associates
References: 96-07-111
Keywords: Basic, parse

I hate to say this, but I've got a simple one and can't give it to
you...


As you read the following, you may realize that yacc is not a good way to
go for Visual Basic. It needs (at least) 3 tokens of lookahead. (I've
found cases where three tokens are needed, and I hope that's the limit.)
I would suggest something like PCCTS (ANTLR/DLG), as it makes it easy to
watch for multi-lookahead sequences.




What I can do is offer a bit of advice based on my experience writing
it:


-- VB is _not_ an LALR(1) langauge! You need 3 tokens lookahead.
For example,


IF expr THEN <end-of-line>
100 END
100 END IF


as you can see, you can't determine if the "100 END" is labelled
END statement, or the start of a labelled "END IF." You need to
look after the END to see if its followed by "IF" or not.
We implemented this by setting up a token buffer between the
scanner and parser. Basically, we renamed the yylex() routine
generated by lex to "real_yylex" and created a new yylex()
routine that grabs a token using real_yylex, if it's a number
(and the first token on a line) get another, if it's END, get
another, if it's IF return LABELED_END_IF. I'm sure there
are other ways to implement this, but this seemed to be the
most maintainable, and we needed the yylex wrapper for the
"next" problem below... While I'm here, this also makes it
easy to change END IF to a single END_IF token, END SUB to a
single END_SUB token etc. This makes the grammar much easier,
and you don't have to worry about the conflict between the
END statement and the "END x" that ends an "x" structure.
Just implement a few routines to create a token buffer, and
have your new yylex "peek" into that buffer... Keep it
simple, though.


-- NEXT I, J
This is a fairly evil construct. Nice for the users, but
can make for an evil grammar. We used the yylex trick above to
watch for NEXT IDENT COMMA, and if we found it, we replace the
COMMA token with COLON NEXT, so the parser would really see
NEXT I : NEXT J
which is easy to parse...


-- The VB docs don't cover the language very well. Watch for stuff from
"normal" basic like
IF expr THEN 30
which is shorthand for
IF expr THEN GOTO 30


There are a few other undocumented things like this that slip
my mind right now.


-- The examples distributed with VB3 and VB4 seem to hit some wacky
situations, and are pretty good for early testing and to help uncover
some of the things that aren't documented.


-- when analyzing VB source, you need to look at all files in a project
as a set -- they have global refs between files.


Hope this helps. If you have any specific questions about parsing VB
(other than "may I have the grammar" feel free to email me.)


Good luck!
Scott






Damon Groenveld wrote:
> If anyone out there has a lex/yacc (or simmilar) grammer definition for VB I
> would like to know about it. (VB 3.0 preferably).


--
Scott Stanchfield McCabe & Associates -- Columbia, Maryland
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.