Related articles |
---|
Question on lex's disambiguating rules andrea@eric.mpr.ca (1990-06-21) |
Re: Question on lex's disambiguating rules utoddl@uncecs.edu (1990-06-21) |
Re: Question on lex's disambiguating rules vern@cs.cornell.edu (1990-06-21) |
Re: Question on lex's disambiguating rules rekers@cwi.nl (1990-06-27) |
Newsgroups: | comp.compilers |
From: | utoddl@uncecs.edu (Todd M. Lewis) |
References: | <1990Jun21.033349.2983@esegue.segue.boston.ma.us> |
Date: | Thu, 21 Jun 90 18:43:29 GMT |
Organization: | UNC Educational Computing Service |
Keywords: | lex, question |
In article <1990Jun21.033349.2983@esegue.segue.boston.ma.us> andrea@eric.mpr.ca (Jennitta Andrea) writes:
>STRING ([^ \t\n]+)
>DIGIT ([0-9])
>
>I have two regular expressions:
>
>{D}{D}":"{D}{D}":"{D}{D} { /* recognize "TIMESTAMP" token */ }
>
>{STRING} { /* recognize STRING token */ }
>
>Because my definition of a "STRING" is so general, the following input
>stream:
>
> 12:30:49AC
>
>is tokenized into a single STRING token ("12:30:49AC"), rather than into a
>TIMESTAMP token ("12:30:49") and a STRING token ("AC").
Would you not then have a problem distinguishing the TIMESTAMP
followed by a STRING
12:30:49AC
from the STRING
12:30:49AC
or do you want to implicitly disallow strings of that form? I'm
not sure you can have it both ways at the lexical level.
I like the idea (half-baked as it is) of some form of nested lexical
analysis. Rather than having to specify all these little quirky interactions
for all tokens in a stream, what if you could specify a rule set which would
break a stream into token stream classes, and then lex each string according
to its class's rules? Each of these strings would be treated as a little
stream and lexed with the rules for its class. Formally it wouldn't be any
more powerful than the current lex (that's a guess--I haven't analyzed it),
but it may make specifying lex rules simpler because you could build "fire
walls" which would prevent rules within a class from interfering with other
rules several pages/screens away.
In the example above, you could break the strings out of the stream, then lex
these strings (short streams of class STRING) to get TIMESTAMPs and STRINGs.
Where possible, of course, the process would be optimized to do as much as
possible in one pass.
Is this a half-baked idea or is it already burnt to a crisp?
_____
Todd M. Lewis
utoddl@ecsvax.uncecs.edu
utoddl@ecsvax.bitnet, @unc.bitnet
utoddl@next1.mscre.unc.edu
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.