Re: is lex useful? (Quinn Tyler Jackson)
24 Jun 1996 11:03:40 -0400

          From comp.compilers

Related articles
is lex useful? (Dan E. Kelley) (1996-06-21)
Re: is lex useful? (1996-06-23)
Re: is lex useful? (Ronald Kanagy) (1996-06-23)
Re: is lex useful? (1996-06-24)
Re: is lex useful? kelley@Phys.Ocean.Dal.Ca (1996-06-24)
Re: is lex useful? (1996-06-24)
Re: is lex useful? (1996-06-24)
Re: is lex useful? (1996-06-26)
Re: is lex useful? (1996-06-26)
Re: is lex useful? (Stefan Monnier) (1996-06-26)
[19 later articles]
| List of all articles for this month |

From: (Quinn Tyler Jackson)
Newsgroups: comp.compilers
Date: 24 Jun 1996 11:03:40 -0400
Organization: Compilers Central
Keywords: lex, performance

On 23 Jun 1996 23:24:53 -0400, Ronald Kanagy wrote:

>Lex is good in situations where a language is still being designed and a
>scanner is to be quickly built. But, in production compilers, after the
>language has be designed and stable, lex scanners tend to be too slow
>compared to hand-coded scanners and is unacceptable. Therefore, one would
>normally find hand-coded scanners in these situations.

>[Has anyone actually timed a flex scanner vs. a hand-coded one? -John]

Not flex, but I did some timings of an LPM scanner generated at
run-time vs. a handcoded version, and found the LPM scanner to be 25
times faster. I suspect (but have not verifed) that lex type scanner
would beat even that.

I will eventually, when it comes time to optimize CLpm, be doing
benches against lex, flex, and Spencer's regexp.c, using a suite of
about 20 RE's. Something must first be proven "correct" before it is
made faster, however. ;-)

One point to note that hasn't been mentioned in the hand-vs-generated
lexical scanner debate is that hand-coded scanners tend to read like
nightmares. In one CLpm demo that scans a file for legal URL's, it
takes 55+ lines of C++ code to do what is accomplished in two lines of
CLpm semantics. Granted, there are 10,000 lines of interpreter
underneath those 2 lines, but now that CLpm is a class and everything
is effectively under the hood, it is much more simple to express
tokens in terms of patterns than in terms of raw C++. It also tends
to be easier to debug patterns than to go through a hand-crafted
scanner looking for a glitch. Moreover, it becomes much simpler to
optimize one central interpreter/generator than to surf through three
hundred swith/if/while statements looking for places to trim the fat.

I've implemented both types, and prefer generated/interpreted scanners
over hand-written ones any day.


        Parsepolis Software || Quinn Tyler Jackson
                "ParseCity" ||
>------ ------>

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.