LRGen 8.0 - LR(1) Lexer & Parser Generator, Free Download.

"Paul B Mann" <paul@paulbmann.com>
Thu, 13 Sep 2007 10:07:02 -0600

          From comp.compilers

Related articles
LRGen 8.0 - LR(1) Lexer & Parser Generator, Free Download. paul@paulbmann.com (Paul B Mann) (2007-09-13)
Re: LRGen 8.0 - LR(1) Lexer & Parser Generator, Free Download. paul@paulbmann.com (Paul B Mann) (2007-09-15)
Re: LRGen 8.0 - LR(1) Lexer & Parser Generator, Free Download. paul@paulbmann.com (Paul B Mann) (2007-09-16)
| List of all articles for this month |

From: "Paul B Mann" <paul@paulbmann.com>
Newsgroups: comp.compilers
Date: Thu, 13 Sep 2007 10:07:02 -0600
Organization: Compilers Central
Keywords: lex, parse, available
Posted-Date: 13 Sep 2007 13:35:05 EDT

LRGen 8.0 is now available under the BSD license for free. It is an
LR(1) Lexer and Parser Generator with compiler front-end source code
in C/C++.


Besides C/C++, it can generate parsers and lexers in any programming
language because it uses a skeleton file input for which the generator
inserts the numbers and the text of the actual parser-table data. In
the past, people have generated parsers in Pascal, assembly language
and others.


It creates Minimal LR(1) parsers and lexers. These finite- state
machines are the same size as LALR(1) but handle the larger class of
LR(1) grammars. So you get the best of both worlds.


Research showed that the number canonical LR(1) states was over
2,000,000 for a COBOL-85 grammar, so that approach was abandoned.


A state-merging algorithm is used during the canonical LR(1) state
construction process which, I think, is similar to that described by
Pager [1973] here:


http://portal.acm.org/citation.cfm?id=804048


The result is that the COBOL parser has 1,660 states and a
parser-table size of 37 K. Generation time is 0.73 seconds.


LRGen uses the Digraph algorithm described in the TOPLAS paper,
"Efficient Computation Of LALR(1) Look-Ahead Sets" by DeRemer and
Pennello [1982] here:


http://portal.acm.org/citation.cfm?id=357187


The parser-table compress technique is based on the paper
"Optimization Of Parser Tables For Portable Compilers", by Dencker,
Durre and Heuft in TOPLAS [1984] here:


http://portal.acm.org/citation.cfm?id=1802


The source code for the LRGen 8.0 is also included in case someone
wants to port it to UNIX or Linux. It currently compiles without any
problems in Microsoft Visual Studio Express 2005.


The generated lexers and parsers are very fast and process input in
time linear with the size of the input. Tests show processing speed
to be about 10 MB per second on a 3 GHz Pentium 4 computer.


Documentation is minimal, but there are about 20 grammars
and 4 sample projects such as:


1) A Calculator,
2) Solution to the C typedef problem,
3) An HTML subset,
4) A text file processor.


LRGen accepts EBNF grammar notation and also TBNF notation, which
greatly automates the construction of a compiler front end. See the
ACM paper on TBNF notation here:


http://portal.acm.org/citation.cfm?id=1147218


LRGen can be used as a stand alone lexer generator to produce very
fast LR(1) lexers if desired. Also, you can use your favorite lexer
generator if you want.


The download is here:


ftp://ftp.iecc.com/pub/file/LRGen%208.0.6.zip


Support may be available. Send an email to me if you have any
questions.


Paul B Mann
paulbmann.com


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.