Re: Question on lex's disambiguating rules

utoddl@uncecs.edu (Todd M. Lewis)
Thu, 21 Jun 90 18:43:29 GMT

          From comp.compilers

Related articles
Question on lex's disambiguating rules andrea@eric.mpr.ca (1990-06-21)
Re: Question on lex's disambiguating rules utoddl@uncecs.edu (1990-06-21)
Re: Question on lex's disambiguating rules vern@cs.cornell.edu (1990-06-21)
Re: Question on lex's disambiguating rules rekers@cwi.nl (1990-06-27)
| List of all articles for this month |
Newsgroups: comp.compilers
From: utoddl@uncecs.edu (Todd M. Lewis)
References: <1990Jun21.033349.2983@esegue.segue.boston.ma.us>
Date: Thu, 21 Jun 90 18:43:29 GMT
Organization: UNC Educational Computing Service
Keywords: lex, question

In article <1990Jun21.033349.2983@esegue.segue.boston.ma.us> andrea@eric.mpr.ca (Jennitta Andrea) writes:
>STRING ([^ \t\n]+)
>DIGIT ([0-9])
>
>I have two regular expressions:
>
>{D}{D}":"{D}{D}":"{D}{D} { /* recognize "TIMESTAMP" token */ }
>
>{STRING} { /* recognize STRING token */ }
>
>Because my definition of a "STRING" is so general, the following input
>stream:
>
> 12:30:49AC
>
>is tokenized into a single STRING token ("12:30:49AC"), rather than into a
>TIMESTAMP token ("12:30:49") and a STRING token ("AC").


Would you not then have a problem distinguishing the TIMESTAMP
followed by a STRING
          12:30:49AC
from the STRING
          12:30:49AC
or do you want to implicitly disallow strings of that form? I'm
not sure you can have it both ways at the lexical level.


I like the idea (half-baked as it is) of some form of nested lexical
analysis. Rather than having to specify all these little quirky interactions
for all tokens in a stream, what if you could specify a rule set which would
break a stream into token stream classes, and then lex each string according
to its class's rules? Each of these strings would be treated as a little
stream and lexed with the rules for its class. Formally it wouldn't be any
more powerful than the current lex (that's a guess--I haven't analyzed it),
but it may make specifying lex rules simpler because you could build "fire
walls" which would prevent rules within a class from interfering with other
rules several pages/screens away.


In the example above, you could break the strings out of the stream, then lex
these strings (short streams of class STRING) to get TIMESTAMPs and STRINGs.
Where possible, of course, the process would be optimized to do as much as
possible in one pass.


Is this a half-baked idea or is it already burnt to a crisp?
_____
Todd M. Lewis
utoddl@ecsvax.uncecs.edu
utoddl@ecsvax.bitnet, @unc.bitnet
utoddl@next1.mscre.unc.edu
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.