Re: what scanner scheme is efficient?

jlilley@empathy.com (John Lilley)
30 Oct 1996 13:26:09 -0500

          From comp.compilers

Related articles
[2 earlier articles]
Re: what scanner scheme is efficient? ramb@primenet.com (Ram Bhamidipaty) (1996-10-16)
Re: what scanner scheme is efficient? peter@peter.bj-ig.de (Peter Brueckner) (1996-10-18)
Re: what scanner scheme is efficient? roth@noel.cs.rice.edu (1996-10-20)
Re: what scanner scheme is efficient? jsgray@acm.org (Jan Gray) (1996-10-20)
Re: what scanner scheme is efficient? clark@quarry.zk3.dec.com (1996-10-24)
Re: what scanner scheme is efficient? james@wgold.demon.co.uk (James Mansion) (1996-10-24)
Re: what scanner scheme is efficient? jlilley@empathy.com (1996-10-30)
Re: what scanner scheme is efficient? vern@daffy.ee.lbl.gov (1996-11-12)
Re: what scanner scheme is efficient? jlilley@empathy.com (1996-11-15)
Re: what scanner scheme is efficient? adrian@dcs.rhbnc.ac.uk (1996-11-19)
Re: what scanner scheme is efficient? vern@daffy.ee.lbl.gov (1996-11-21)
Re: what scanner scheme is efficient? adrian@dcs.rhbnc.ac.uk (1996-11-24)
Re: what scanner scheme is efficient? jlilley@empathy.com (1996-12-01)
| List of all articles for this month |
From: jlilley@empathy.com (John Lilley)
Newsgroups: comp.compilers
Date: 30 Oct 1996 13:26:09 -0500
Organization: Empathy Software
References: 96-10-076 96-10-081
Keywords: lex, performance

peter@peter.bj-ig.de says...
>
>1. The scanner is faster (with an 'IDENT' clause) because it doesn't need to
> 'backtrack', example:
> keyword = "LET"
> input = "LETI"
> scanner L->E->T->I->backtrack(4)->L->E->T->I->whitspace->backup(1)->return
> see the flexinfo manpage or flex-infos.


This is irrelevant, because:


1) The sets of typical keywords don't require backtracking. e.g. C++
keywords require no backtracking anywhere. This is not accidental --
the standards people usually think (really!) about this beforehand.
Your example above does not even require backtracking -- a simple
one-character-lookahead lexer like DLG will deal with. An example
that requires backtracking would be like:


        keyword1 = "LET"
        keyword2 = "LETXX"
        input = "LETX "


Just because FLEX supports backtracking does not mean that you have to
use it.


2) Even if backtracking exists, it is usually so rare that the
additional time spent hashing every token outweighs the savings for
rare tokens.


>2. The scanner table is very small. (Every Letter in the Keywords adds 2
>States).


True. For example, the state machine for my DLG-generated C++ lexer
takes about 50-100k, and this might even cause cache misses, which
would slow things down a bit. But when MSVC++ takes 24Mb to compile
one of my files, I think there is considerable slack in the
time/memory tradeoff.


Has anyone actually *tested* the supposed hash-table speedup?


john lilley
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.