Re: Context sensitive scanner ?

"Scott Stanchfield" <>
30 Nov 1997 22:58:58 -0500

          From comp.compilers

Related articles
[6 earlier articles]
Re: Context sensitive scanner ? (Chris F Clark) (1997-11-28)
Re: Context sensitive scanner ? (Henry Spencer) (1997-11-28)
Re: Context sensitive scanner ? (1997-11-29)
Re: Context sensitive scanner ? (Albert Theo Hofkamp) (1997-11-29)
Re: Context sensitive scanner ? (Scott Stanchfield) (1997-11-30)
Re: Context sensitive scanner ? (1997-11-30)
Re: Context sensitive scanner ? (Scott Stanchfield) (1997-11-30)
Re: Context sensitive scanner ? (Chris Clark USG) (1997-12-05)
Re: Context sensitive scanner ? (Mark Thiehatten) (1997-12-07)
Re: Context sensitive scanner ? (1997-12-07)
| List of all articles for this month |

From: "Scott Stanchfield" <>
Newsgroups: comp.compilers
Date: 30 Nov 1997 22:58:58 -0500
Organization: Compilers Central
References: 97-11-117
Keywords: lex

Please, I beg you, purge the idea of "scanner knowing the parser
state" from your mind. It's way too easy for this to turn on you!

Simple reason:

In a bottom-up parser, you can't tell in an action if a token in the
follow set was needed to be seen to reduce or not. Some rules can be
reduced just by seeing the current token and deciding that it is the
last possible thing in the "current" production, so it's reduced.

In other rules you might need to see the _next_ token to see if you
can reduce. (Something like a comma-separated list of elements -- the
next token is looked at to see if it's a comma.)

I got burned badly by this a few times in parsers I was maintaining.
The scanner was performing the symbol table lookup, and the parser was
pushing/popping scopes. There was a case that came up a few times
where the lookahead was scanned _before_ the scope was pushed or
popped. (The parser needed to check the next token to see if it could
reduce the rule, then, when reduced, its action pushed/popped a

The problem was that that _next_ token was an ID that needed to be
looked up in the proper scope, which hadn't yet been set...

  And just looking at the rule performing the reduction isn't enough --
the context in which that rule is called determines whether or not
lookahead is needed to determine shift/reduce.

  LL(k) parsers have the same problem -- a token may be scanned to be
used as lookahead...

  Another possibility...

  Have the scanner _always_ return a REAL when it appears to be a real, and
have the parser re-interpret it.

  So the rule to match the index might look like

          : (INT|REAL|ID) (DOT (INT|REAL|ID))*

  and add an action for each of the "REAL"s to interpret it as INT DOT int.

  One thing here -- make sure you're returning a STRING value for REALs
or you could have some precision problems. Also, a STRING value will
make it easier to separate out into INT DOT INT again.

  But back to the real issue: why is it a problem to just have the scanner
keep a state? Is it ever possible to have

  (I think not)

  If not, then it should be easy to just return the right token directly from
the scanner. (Or did I miss something again ;)

  Just have a scanner state called INDEXING or something like that that is:
      * entered after seeing ID w/right context DOT
      * matches ID or INT with right context DOT and stays in
      * matches ID or INT WITHOUT right context DOT and leaves

  I think most lexi (is that plural for lex?) provide right context
  specification, don't they? And a tool like ANTLR 2.0 provides a
  recursive-descent scanner in which you can check lookahead.

  -- Scott

  Scott Stanchfield -
  MageLang Institute -

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.