Re: what scanner scheme is efficient?

jlilley@empathy.com (John Lilley)
15 Nov 1996 12:03:35 -0500

          From comp.compilers

Related articles
[4 earlier articles]
Re: what scanner scheme is efficient? roth@noel.cs.rice.edu (1996-10-20)
Re: what scanner scheme is efficient? jsgray@acm.org (Jan Gray) (1996-10-20)
Re: what scanner scheme is efficient? clark@quarry.zk3.dec.com (1996-10-24)
Re: what scanner scheme is efficient? james@wgold.demon.co.uk (James Mansion) (1996-10-24)
Re: what scanner scheme is efficient? jlilley@empathy.com (1996-10-30)
Re: what scanner scheme is efficient? vern@daffy.ee.lbl.gov (1996-11-12)
Re: what scanner scheme is efficient? jlilley@empathy.com (1996-11-15)
Re: what scanner scheme is efficient? adrian@dcs.rhbnc.ac.uk (1996-11-19)
Re: what scanner scheme is efficient? vern@daffy.ee.lbl.gov (1996-11-21)
Re: what scanner scheme is efficient? adrian@dcs.rhbnc.ac.uk (1996-11-24)
Re: what scanner scheme is efficient? jlilley@empathy.com (1996-12-01)
| List of all articles for this month |
From: jlilley@empathy.com (John Lilley)
Newsgroups: comp.compilers
Date: 15 Nov 1996 12:03:35 -0500
Organization: Empathy Software
References: 96-10-076 96-10-081 96-10-149 96-11-079
Keywords: lex, performance

John Lilley <jlilley@empathy.com> wrote:
> Just because FLEX supports backtracking does not mean that you have to
> use it.


vern@daffy.ee.lbl.gov says...
>Let me clarify. An example is matching two keywords, "foo"
>and "foobar". If 'f', 'o', 'o' have been scanned and the next
>characters are 'b' and 'a', the scanner must match them in case what
>comes next is 'r'. But if it's not 'r', then it must return to the
>point where it had matched "foo". It will later re-scan 'b' and 'a',
>when matching the next token.
>Most scanners have to do backing-up [...]


What sample have you examined to arrive at the conslusion that *most*
scanners require backing-up? The languages that I know of (C, C++,
Pascal) do not require any backtracking to support their keyword sets.
The example you give of "foo" vs "foobar" is interesting but atypical,
because most languages require keywords to be delimited by non-alpha
characters. In your example of "foo" vs "foobar", if it were fed
"fooba", what a non-backtracking lexer would give you is either (a) an
identifier, if you have that defined as one of your tokens, or (b) an
error, finding "fooba" to be an unrecognized keyword. This is the
case for "most" scanners, if you sample production languages.


An example in "C" illustrates what I mean. If I write:
        unsigned short x;
it's pretty clear what is happening. If I write:
        unsignedshortx;
the lexer simply treats it as an identifier.


I'm not saying that there are no languages where backtracking in the
lexer is necessary. In particular, languages where keywords need not
be delimited may be forced to backtrack, as you pointed out, to
extract a prefix from a non-delimited keyword. There is also the
range operator (e.g, 1..4) in Pascal that is nasty for a
non-backtracking lexer to handle because it looks like a
floating-point number until you see the second '.' (there are,
unfortunately, clever hacks to deal with this anomoly). However, this
is the unusual case. Most of the time, a single character of
lookahead with no backtracking is sufficient.


john
[Vern wrote flex, he oughta know how it works. -John]


--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.