|LRGen 8.0 - LR(1) Lexer & Parser Generator, Free Download. email@example.com (Paul B Mann) (2007-09-13)|
|Re: LRGen 8.0 - LR(1) Lexer & Parser Generator, Free Download. firstname.lastname@example.org (Paul B Mann) (2007-09-15)|
|Re: LRGen 8.0 - LR(1) Lexer & Parser Generator, Free Download. email@example.com (Paul B Mann) (2007-09-16)|
|From:||"Paul B Mann" <firstname.lastname@example.org>|
|Date:||Thu, 13 Sep 2007 10:07:02 -0600|
|Keywords:||lex, parse, available|
|Posted-Date:||13 Sep 2007 13:35:05 EDT|
LRGen 8.0 is now available under the BSD license for free. It is an
LR(1) Lexer and Parser Generator with compiler front-end source code
Besides C/C++, it can generate parsers and lexers in any programming
language because it uses a skeleton file input for which the generator
inserts the numbers and the text of the actual parser-table data. In
the past, people have generated parsers in Pascal, assembly language
It creates Minimal LR(1) parsers and lexers. These finite- state
machines are the same size as LALR(1) but handle the larger class of
LR(1) grammars. So you get the best of both worlds.
Research showed that the number canonical LR(1) states was over
2,000,000 for a COBOL-85 grammar, so that approach was abandoned.
A state-merging algorithm is used during the canonical LR(1) state
construction process which, I think, is similar to that described by
Pager  here:
The result is that the COBOL parser has 1,660 states and a
parser-table size of 37 K. Generation time is 0.73 seconds.
LRGen uses the Digraph algorithm described in the TOPLAS paper,
"Efficient Computation Of LALR(1) Look-Ahead Sets" by DeRemer and
Pennello  here:
The parser-table compress technique is based on the paper
"Optimization Of Parser Tables For Portable Compilers", by Dencker,
Durre and Heuft in TOPLAS  here:
The source code for the LRGen 8.0 is also included in case someone
wants to port it to UNIX or Linux. It currently compiles without any
problems in Microsoft Visual Studio Express 2005.
The generated lexers and parsers are very fast and process input in
time linear with the size of the input. Tests show processing speed
to be about 10 MB per second on a 3 GHz Pentium 4 computer.
Documentation is minimal, but there are about 20 grammars
and 4 sample projects such as:
1) A Calculator,
2) Solution to the C typedef problem,
3) An HTML subset,
4) A text file processor.
LRGen accepts EBNF grammar notation and also TBNF notation, which
greatly automates the construction of a compiler front end. See the
ACM paper on TBNF notation here:
LRGen can be used as a stand alone lexer generator to produce very
fast LR(1) lexers if desired. Also, you can use your favorite lexer
generator if you want.
The download is here:
Support may be available. Send an email to me if you have any
Paul B Mann
Return to the
Search the comp.compilers archives again.