Re: is lex useful?

Stefan Monnier <stefan.monnier@lia.di.epfl.ch>
26 Jun 1996 11:38:47 -0400

          From comp.compilers

Related articles
[3 earlier articles]
Re: is lex useful? qjackson@direct.ca (1996-06-24)
Re: is lex useful? kelley@Phys.Ocean.Dal.Ca (1996-06-24)
Re: is lex useful? Scott.Nicol@infoadvan.com (1996-06-24)
Re: is lex useful? kanze@lts.sel.alcatel.de (1996-06-24)
Re: is lex useful? bos@serpentine.com (1996-06-26)
Re: is lex useful? dhami@mdd.comm.mot.com (1996-06-26)
Re: is lex useful? stefan.monnier@lia.di.epfl.ch (Stefan Monnier) (1996-06-26)
Re: is lex useful? raph@kiwi.cs.berkeley.edu (1996-06-26)
Re: is lex useful? rgreen@barach.bbn.com (1996-06-26)
Re: is lex useful? leichter@smarts.com (Jerry Leichter) (1996-06-27)
Re: is lex useful? scooter@mccabe.com (Scott Stanchfield) (1996-06-27)
Re: is lex useful? Scott.Nicol@infoadvan.com (1996-06-27)
Re: is lex useful? Scott.Nicol@infoadvan.com (1996-06-27)
[13 later articles]
| List of all articles for this month |

From: Stefan Monnier <stefan.monnier@lia.di.epfl.ch>
Newsgroups: comp.compilers
Date: 26 Jun 1996 11:38:47 -0400
Organization: Ecole Polytechnique Federale de Lausanne
References: 96-06-073 96-06-105
Keywords: lex

Scott Nicol <Scott.Nicol@infoadvan.com> wrote:
] - Lots of fixed limits...
] - Generated scanner is hard-coded to one character set.
] - Lots of globals
] - Parser-scanner interactions can get really hairy


I believe that re2c can help here. It's a preprocessor that expands special
"switch" statements (where each case is a regexp rather than a scalar) into
C code. So basically it's an extension of C's switch statement. All the
special features for state or for buffering are left out, so you have
complete control. It makes it much easier to deal with scanner-parser
interactions, it uses no globals, the hard-limits can only be imposed by
you, ...


] - No support for wide (>8 bit) character sets. Even 8-bit support is
] fairly recent. The obvious implementation for wide characters (expand
] tables to 16 bits) isn't practical, because you would increase the tables
] sizes (which are already huge) 256x.


The other obvious option is to treat a 16bit char as two 8bit chars.
It might be less readable, but it works great.


] On top of all these things, it is really easy to hand-write a scanner that
] does all of these things (and more), and it won't take you much more time
] than writing a Lex scanner. I have also probably missed a bunch of other
] serious deficiencies.


re2c forces you to write big parts of the scanner by hand (like counting
lines, or reading from a file), but it generates fast scanners (all the
weight is manually introduced) where the programmer has almost complete
control of all the parameters while keeping the readability and
maintainability of a bunch or REs (as opposed to big nested switch
statements, for instance).


                Stefan
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.