Re: what scanner scheme is efficient?

jlilley@empathy.com (John Lilley)
1 Dec 1996 22:56:00 -0500

From comp.compilers

Related articles
[8 earlier articles]
Re: what scanner scheme is efficient? jlilley@empathy.com (1996-10-30)
Re: what scanner scheme is efficient? vern@daffy.ee.lbl.gov (1996-11-12)
Re: what scanner scheme is efficient? jlilley@empathy.com (1996-11-15)
Re: what scanner scheme is efficient? adrian@dcs.rhbnc.ac.uk (1996-11-19)
Re: what scanner scheme is efficient? vern@daffy.ee.lbl.gov (1996-11-21)
Re: what scanner scheme is efficient? adrian@dcs.rhbnc.ac.uk (1996-11-24)
*Re: what scanner scheme is efficient? jlilley@empathy.com* (1996-12-01)**

| List of all articles for this month |

From:	jlilley@empathy.com (John Lilley)
Newsgroups:	comp.compilers
Date:	1 Dec 1996 22:56:00 -0500
Organization:	Empathy Software
References:	96-10-076 96-10-081 96-10-149 96-11-079 96-11-103 96-11-123
Keywords:	lex

adrian@dcs.rhbnc.ac.uk says...
>
>But it's not just keywords. Consider the two Pascal fragments:
>
> 1.4
> 1..4
>
>The first tokenises as REAL_LITERAL(1.4) whereas the second goes to
>INTEGER_LITERAL(1) KEYWORD(..) INTEGER_LITERAL(4). After lexing 1. if
>you then see another period you have to back up.

As yes, this is true. (And FORTRAN is worse, of course). But even
this nasty case can be treated with a hack. Of course, anything can
be treated with a hack :-)

In this case, a normal, non-backtracking, one-char-lookahead lexer can
deal with it if you define tokens like (and I simplify):

FLOAT = "[0-9]+\.[0-9]+"
INT = "[0-9]+"
RANGE = "\.\."
RANGE_START = "[0-9]+\.\."

Of course, you must (a) write the grammar to accomodate either INT
RANGE INT or RANGE_START INT (b) strip the trailing ".." when you
process the RANGE_START token. And it gets worse, because typically
the range can be an expression, not just an integer.

A "better" approach, if you can stomach it, is to hack the lexer
output, and turn RANGE_START into two separate INT, RANGE tokens. But
now we're approaching the land of the original problem, which is
hacking the lexer output with a perfect hash table.

Other than that, Mrs. Lincoln...

john
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: what scanner scheme is efficient?

jlilley@empathy.com (John Lilley)1 Dec 1996 22:56:00 -0500

jlilley@empathy.com (John Lilley)
1 Dec 1996 22:56:00 -0500