|Code generation from AST firstname.lastname@example.org (Lucas S. Silva) (2007-11-10)|
|DFA Lexer Generation From BNF email@example.com (Paul B Mann) (2007-11-10)|
|Re: DFA Lexer Generation From BNF cfc@shell01.TheWorld.com (Chris F Clark) (2007-11-11)|
|Re: DFA Lexer Generation From BNF gneuner2/@/comcast.net (George Neuner) (2007-11-12)|
|Re: DFA Lexer Generation From BNF firstname.lastname@example.org (Paul B Mann) (2007-11-16)|
|Re: DFA Lexer Generation From BNF bobduff@shell01.TheWorld.com (Robert A Duff) (2007-11-16)|
|Re: DFA Lexer Generation From BNF email@example.com (2007-11-17)|
|Date:||Sat, 17 Nov 2007 10:11:34 -0800 (PST)|
On Nov 10, 11:23 pm, "Paul B Mann" <p...@oct.net> wrote:
> Does anyone know of alexergenerator whose input is aBNFgrammar
> instead of regular expressions ?
> Paul Mann
> [DFA's aren't adequate to recognizeBNF, which is why parser
> generators use a DFA and a stack or other more powerful machines. I
> suppose you could limit it to aBNFsubset equivalent to regexps, but
> what would be the point? -John]
Another perspective is, "why bother using DFAs?" The usual, and good
argument, is "speed". But with modern computers, there seems to be
less emphasis on the latter, and if you aren't processing enormous
amounts of source, then perhaps convenience is the driver.
The ASF+SDF project (also known as "XT" and shipped as part of the
Stratego program transformation tool) uses "scannerless GLR parsers",
a single unified GLR parsing algorithm to lex and parse. See
particular, if I understand it right, they simply let you write the
grammar using BNF rules all the way down to tokens and whitespace.
This is an extremely clean way to specify a langauge.
But the apparant advantage goes beyond that. In many real world
situations, often one embeds one language into another (SQL in COBOL,
grammars for each, composing this is close to trivial; you simply add
a rule to the containing langauge that introduces an escape indication
and allows a nonterminal from the contained langauge. My personal
opinion is that this is a truly pretty solution.
If the escapes are carefully designed, you really can't end up with
any ambiguities. Most escapes used in real langauges with embedded
sublanguages already have this property; otherwise they wouldn't be
useful as escapes.
This gets one back to the efficiency vs convenience question. We
build a tool (DMS Software Reengineering Toolkit) that also uses GLR
parsing. But we use conventional lexical definitions because our
interest is in large scale applications, where parsing speed matters.
So we give up some convenience in specification for speed. [Of
course, we can always simply use the GLR parser the character
level. And we'll probably go back someday and look at unifying the two
Ira Baxter, CTO
Return to the
Search the comp.compilers archives again.