Re: DFA Lexer Generation From BNF

"Paul B Mann" <>
Fri, 16 Nov 2007 16:39:58 -0600

          From comp.compilers

Related articles
Code generation from AST (Lucas S. Silva) (2007-11-10)
DFA Lexer Generation From BNF (Paul B Mann) (2007-11-10)
Re: DFA Lexer Generation From BNF (Chris F Clark) (2007-11-11)
Re: DFA Lexer Generation From BNF gneuner2/@/ (George Neuner) (2007-11-12)
Re: DFA Lexer Generation From BNF (Paul B Mann) (2007-11-16)
Re: DFA Lexer Generation From BNF (Robert A Duff) (2007-11-16)
Re: DFA Lexer Generation From BNF (2007-11-17)
| List of all articles for this month |

From: "Paul B Mann" <>
Newsgroups: comp.compilers
Date: Fri, 16 Nov 2007 16:39:58 -0600
Organization: Compilers Central
References: 07-11-033 07-11-038 07-11-043
Keywords: lex
Posted-Date: 16 Nov 2007 20:37:36 EST

>> Does anyone know of a lexer generator whose input is a BNF grammar
>> instead of regular expressions ?
>> [DFA's aren't adequate to recognize BNF, which is why parser
>> generators use a DFA and a stack or other more powerful machines. I
>> suppose you could limit it to a BNF subset equivalent to regexps, but
>> what would be the point? -John]

> I'm not aware of any advantage to doing so, so yacc++ doesn't do it.
> Are you thinking of using BNF based lexers?

I know of one product, DFA 1.0, that used to come with LALR 4.3 from

It accepted a BNF grammar and decomposed it into a DFA, if it were
possible to remove all nonterminal transitions. If not, it would make
a list of those nonterminals not removed.

If a BNF grammar contains nonterminals, then without some kind of
optimization to remove them from the finite-state machine, it cannot
be processed by a DFA at runtime.

But some grammars can be optimized and have all nonterminals removed.
This concept interests me, because it is a challenge in graph theory,
at least graph theory as it pertains to finite-state machines for
parsing and lexing.

If one can remove nonterminals and also the states they make a
transition to, then one can improve the performance and decrease the
size of the state machine.

My current LRGen 8.2 generates an LALR lexer with it's nonterminal
transitions included. It even removes chain-reductions for unit

Removing the non-unit reductions, requires a change in the classical
parsing algorithm, so I haven't done that yet (not sure if it's
possible or useful, since almost every reduction would have to create
a node in the AST anyway in a real-world compiler front end).

I recently compared it's performance to a DFA lexer created by DFA 1.0
and found the DFA lexer to be over 5 times the speed of the LALR

The LALR lexer was using up 43% of the runtime of the language
processor which included building an AST in memory.

The DFA lexer is using up only 12% of the runtime.

So I decided to rewrite the decomposition algorithm, since I lost it
many years ago. It is definitely an interesting exercise.

I wonder if one could apply this same decomposition to parser tables
as well and invent an improved algorithm for parsing with these
optimized or partially optimized parser tables? It would require a
different algorithm than the classical push-down stacking operation
currently known.

Since I originally did this in 1986, I thought by now someone else
would be doing it. That's why I asked the question.

Now there is a new troublesome issue that has raised it's ugly head.
The LR(0) state-building algorithm merges states that should not be
merged for easy construction of DFA's.

It is now a another challenge that I face, to undo the merging of
some states that are preventing the removal of certain nonterminal

Any comments? Has anybody else delt with this before?

Paul B Mann

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.