Thu, 13 Sep 2007 10:07:02 -0600

Related articles |
---|

LRGen 8.0 - LR(1) Lexer & Parser Generator, Free Download. paul@paulbmann.com (Paul B Mann) (2007-09-13) |

Re: LRGen 8.0 - LR(1) Lexer & Parser Generator, Free Download. paul@paulbmann.com (Paul B Mann) (2007-09-15) |

Re: LRGen 8.0 - LR(1) Lexer & Parser Generator, Free Download. paul@paulbmann.com (Paul B Mann) (2007-09-16) |

From: | "Paul B Mann" <paul@paulbmann.com> |

Newsgroups: | comp.compilers |

Date: | Thu, 13 Sep 2007 10:07:02 -0600 |

Organization: | Compilers Central |

Keywords: | lex, parse, available |

Posted-Date: | 13 Sep 2007 13:35:05 EDT |

LRGen 8.0 is now available under the BSD license for free. It is an

LR(1) Lexer and Parser Generator with compiler front-end source code

in C/C++.

Besides C/C++, it can generate parsers and lexers in any programming

language because it uses a skeleton file input for which the generator

inserts the numbers and the text of the actual parser-table data. In

the past, people have generated parsers in Pascal, assembly language

and others.

It creates Minimal LR(1) parsers and lexers. These finite- state

machines are the same size as LALR(1) but handle the larger class of

LR(1) grammars. So you get the best of both worlds.

Research showed that the number canonical LR(1) states was over

2,000,000 for a COBOL-85 grammar, so that approach was abandoned.

A state-merging algorithm is used during the canonical LR(1) state

construction process which, I think, is similar to that described by

Pager [1973] here:

http://portal.acm.org/citation.cfm?id=804048

The result is that the COBOL parser has 1,660 states and a

parser-table size of 37 K. Generation time is 0.73 seconds.

LRGen uses the Digraph algorithm described in the TOPLAS paper,

"Efficient Computation Of LALR(1) Look-Ahead Sets" by DeRemer and

Pennello [1982] here:

http://portal.acm.org/citation.cfm?id=357187

The parser-table compress technique is based on the paper

"Optimization Of Parser Tables For Portable Compilers", by Dencker,

Durre and Heuft in TOPLAS [1984] here:

http://portal.acm.org/citation.cfm?id=1802

The source code for the LRGen 8.0 is also included in case someone

wants to port it to UNIX or Linux. It currently compiles without any

problems in Microsoft Visual Studio Express 2005.

The generated lexers and parsers are very fast and process input in

time linear with the size of the input. Tests show processing speed

to be about 10 MB per second on a 3 GHz Pentium 4 computer.

Documentation is minimal, but there are about 20 grammars

and 4 sample projects such as:

1) A Calculator,

2) Solution to the C typedef problem,

3) An HTML subset,

4) A text file processor.

LRGen accepts EBNF grammar notation and also TBNF notation, which

greatly automates the construction of a compiler front end. See the

ACM paper on TBNF notation here:

http://portal.acm.org/citation.cfm?id=1147218

LRGen can be used as a stand alone lexer generator to produce very

fast LR(1) lexers if desired. Also, you can use your favorite lexer

generator if you want.

The download is here:

ftp://ftp.iecc.com/pub/file/LRGen%208.0.6.zip

Support may be available. Send an email to me if you have any

questions.

Paul B Mann

paulbmann.com

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.