Fri, 16 Nov 2007 16:39:58 -0600

Related articles |
---|

Code generation from AST lssilva@gmail.com (Lucas S. Silva) (2007-11-10) |

DFA Lexer Generation From BNF pbm@oct.net (Paul B Mann) (2007-11-10) |

Re: DFA Lexer Generation From BNF cfc@shell01.TheWorld.com (Chris F Clark) (2007-11-11) |

Re: DFA Lexer Generation From BNF gneuner2/@/comcast.net (George Neuner) (2007-11-12) |

Re: DFA Lexer Generation From BNF pbm@oct.net (Paul B Mann) (2007-11-16) |

Re: DFA Lexer Generation From BNF bobduff@shell01.TheWorld.com (Robert A Duff) (2007-11-16) |

Re: DFA Lexer Generation From BNF idbaxter@semdesigns.com (2007-11-17) |

From: | "Paul B Mann" <pbm@oct.net> |

Newsgroups: | comp.compilers |

Date: | Fri, 16 Nov 2007 16:39:58 -0600 |

Organization: | Compilers Central |

References: | 07-11-033 07-11-038 07-11-043 |

Keywords: | lex |

*>> Does anyone know of a lexer generator whose input is a BNF grammar*

*>> instead of regular expressions ?*

*>>*

*>> [DFA's aren't adequate to recognize BNF, which is why parser*

*>> generators use a DFA and a stack or other more powerful machines. I*

*>> suppose you could limit it to a BNF subset equivalent to regexps, but*

*>> what would be the point? -John]*

*> I'm not aware of any advantage to doing so, so yacc++ doesn't do it.*

*>*

*> Are you thinking of using BNF based lexers?*

I know of one product, DFA 1.0, that used to come with LALR 4.3 from

Autom.

It accepted a BNF grammar and decomposed it into a DFA, if it were

possible to remove all nonterminal transitions. If not, it would make

a list of those nonterminals not removed.

If a BNF grammar contains nonterminals, then without some kind of

optimization to remove them from the finite-state machine, it cannot

be processed by a DFA at runtime.

But some grammars can be optimized and have all nonterminals removed.

This concept interests me, because it is a challenge in graph theory,

at least graph theory as it pertains to finite-state machines for

parsing and lexing.

If one can remove nonterminals and also the states they make a

transition to, then one can improve the performance and decrease the

size of the state machine.

My current LRGen 8.2 generates an LALR lexer with it's nonterminal

transitions included. It even removes chain-reductions for unit

reductions.

Removing the non-unit reductions, requires a change in the classical

parsing algorithm, so I haven't done that yet (not sure if it's

possible or useful, since almost every reduction would have to create

a node in the AST anyway in a real-world compiler front end).

I recently compared it's performance to a DFA lexer created by DFA 1.0

and found the DFA lexer to be over 5 times the speed of the LALR

lexer.

The LALR lexer was using up 43% of the runtime of the language

processor which included building an AST in memory.

The DFA lexer is using up only 12% of the runtime.

So I decided to rewrite the decomposition algorithm, since I lost it

many years ago. It is definitely an interesting exercise.

I wonder if one could apply this same decomposition to parser tables

as well and invent an improved algorithm for parsing with these

optimized or partially optimized parser tables? It would require a

different algorithm than the classical push-down stacking operation

currently known.

Since I originally did this in 1986, I thought by now someone else

would be doing it. That's why I asked the question.

Now there is a new troublesome issue that has raised it's ugly head.

The LR(0) state-building algorithm merges states that should not be

merged for easy construction of DFA's.

It is now a another challenge that I face, to undo the merging of

some states that are preventing the removal of certain nonterminal

transitions.

Any comments? Has anybody else delt with this before?

Paul B Mann

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.