Related articles |
---|
parsing in VB mfont@recercai.com (Marc Font) (1999-08-04) |
Re: parsing in VB cgbatema@undergrad.math.uwaterloo.ca (1999-08-07) |
Re: parsing in VB cfc@world.std.com (Chris F Clark) (1999-08-12) |
From: | Chris F Clark <cfc@world.std.com> |
Newsgroups: | comp.compilers |
Date: | 12 Aug 1999 02:54:21 -0400 |
Organization: | The World Public Access UNIX, Brookline, MA |
References: | 99-08-022 99-08-036 |
Keywords: | Basic, parse, comment |
In a reply I mostly agree with, Cameron Bateman writes:
> I'm surprised to see a commmercial parser-generator that uses LR(1):
> the CFSM's for LR(1) parsers are generally considered too large and
> unwieldy; particularly if you are recognizing a language as large as
> VB.
This is essentially a myth these days. While it is true that the
cannonical LR(1) tables are generally too large to be practical, most
commercial "full" LR parser generators use some variation on "minimal
state LR machines"--these are machines which do the LALR compression
on all states that don't need the full LR capacity to reduce conflicts
and selectively "split" the few states which do. For most grammars,
the tables are almost the same size as the LALR tables (in fact for
all LALR grammars, the tables are exactly the same size).
With Visual Parse++ (I suspect that is the tool Marc Font is actually
using) there is another option, you can use its "natural language
parsing" mode, which as far as I can tell is GLR parsing. In GLR
parsing, you use the same(*) LALR (or LR or SLR) parsing tables that
generator produces by default but ignore the conflicts in the grammar
and instead use a run-time engine that does Earley style parsing with
those tables. (* The parser generator cannot resolve the shift/reduce
and reduce/reduce conflicts in the table, but instead somehow enter
both conflicting actions into the table, so they are not completely
the same, but that is a minor implementation issue.)
BTW, the problem that most languages have that makes them non-LR, is
context sensitivity of keywords and/or type names.
The general "hack" that resolves the problem is to check all
identifiers against a keyword list and return the keyword if the
identifier matches. If [user defined] type names can be keywords, add
the type names to your keyword list (as the user defines them and
remove them from the list when they are no longer valid).
Occassionally, some grammars require you to convert keywords back to
identifiers. This can be done with a rule of the form:
identifier: // non-terminal (used in the rest of the parser)
IDENTIFIER // token returned from lexer
| IF // keyword token returned from lexer
| WHILE // another keyword token returned from lexer
. . .
;
In some places, that rule will cause conflicts and you them make
alternate rules that prune out the conflicting keywords
identifier-no-if: // non-terminal (used where the if-keyword is a problem)
IDENTIFIER // token returned from lexer
// | IF // keyword token returned from lexer (not in this identifier)
| WHILE // another keyword token returned from lexer
. . .
;
Finally, sometimes the semantics of the language imply cases where
certain keywords (or type names) don't apply. In those, cases you may
need to either uses a predicate (a grammar extension not in Visual
Parse++) or action code that feeds back info from your parser to your
lexer.
Hope this helps,
-Chris
*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. CompuServe : 74252,1375
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)
------------------------------------------------------------------------------
Web Site in Progress: Web Site : http://world.std.com/~compres
[LR(1) tables were "too large" when the entire memory in your computer
was 64K, but that was quite a while ago. -John]
Return to the
comp.compilers page.
Search the
comp.compilers archives again.