Re: what scanner scheme is efficient?

jlilley@empathy.com (John Lilley)
1 Dec 1996 22:56:00 -0500

          From comp.compilers

Related articles
[8 earlier articles]
Re: what scanner scheme is efficient? jlilley@empathy.com (1996-10-30)
Re: what scanner scheme is efficient? vern@daffy.ee.lbl.gov (1996-11-12)
Re: what scanner scheme is efficient? jlilley@empathy.com (1996-11-15)
Re: what scanner scheme is efficient? adrian@dcs.rhbnc.ac.uk (1996-11-19)
Re: what scanner scheme is efficient? vern@daffy.ee.lbl.gov (1996-11-21)
Re: what scanner scheme is efficient? adrian@dcs.rhbnc.ac.uk (1996-11-24)
Re: what scanner scheme is efficient? jlilley@empathy.com (1996-12-01)
| List of all articles for this month |
From: jlilley@empathy.com (John Lilley)
Newsgroups: comp.compilers
Date: 1 Dec 1996 22:56:00 -0500
Organization: Empathy Software
References: 96-10-076 96-10-081 96-10-149 96-11-079 96-11-103 96-11-123
Keywords: lex

adrian@dcs.rhbnc.ac.uk says...
>
>But it's not just keywords. Consider the two Pascal fragments:
>
> 1.4
> 1..4
>
>The first tokenises as REAL_LITERAL(1.4) whereas the second goes to
>INTEGER_LITERAL(1) KEYWORD(..) INTEGER_LITERAL(4). After lexing 1. if
>you then see another period you have to back up.


As yes, this is true. (And FORTRAN is worse, of course). But even
this nasty case can be treated with a hack. Of course, anything can
be treated with a hack :-)


In this case, a normal, non-backtracking, one-char-lookahead lexer can
deal with it if you define tokens like (and I simplify):


FLOAT = "[0-9]+\.[0-9]+"
INT = "[0-9]+"
RANGE = "\.\."
RANGE_START = "[0-9]+\.\."


Of course, you must (a) write the grammar to accomodate either INT
RANGE INT or RANGE_START INT (b) strip the trailing ".." when you
process the RANGE_START token. And it gets worse, because typically
the range can be an expression, not just an integer.


A "better" approach, if you can stomach it, is to hack the lexer
output, and turn RANGE_START into two separate INT, RANGE tokens. But
now we're approaching the land of the original problem, which is
hacking the lexer output with a perfect hash table.


Other than that, Mrs. Lincoln...


john
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.