Scanner/Parser Whitespace Tokens: Ignore, Don't Ignore

MaggotChild <>
Sun, 31 May 2009 19:26:43 -0700 (PDT)

          From comp.compilers

Related articles
Scanner/Parser Whitespace Tokens: Ignore, Don't Ignore (MaggotChild) (2009-05-31)
| List of all articles for this month |

From: MaggotChild <>
Newsgroups: comp.compilers
Date: Sun, 31 May 2009 19:26:43 -0700 (PDT)
Organization: Compilers Central
Keywords: lex, question
Posted-Date: 01 Jun 2009 10:22:45 EDT


I have a problem with my scanner and, in turn, my parser.

They're written in Rex and Racc (Ruby Lex & Yacc), the syntax is
somewhat similar to Lex and Yacc.

Here are problematic parts (":STATE" means in start condition :STATE
and [ :SYMBOL, text ] returns an array containing the matched token
and its value).

The scanner:

      QUOTE "
      WORD \w+
      PUNCT [&!,:'`]

      {PUNCT} { [ :PUNCT, text ] }
      {QUOTE} { state = :STRING; [ :QUOTE, text ] }
:STRING [^{QUOTE}]+(?={QUOTE}) { [:WORD, text ] }
:STRING {QUOTE} { state = nil; [:QUOTE, text ] }
      {WORD} { [ :WORD, text ] }

And the relevant part of the parser file:

      name: WORD
| name PUNCT { result = "#{val[0]}#{val[1]}" }
   | QUOTE name QUOTE { result = val[1] }

The scanner ignores spaces, though this causes problems for my parser
when presented with tokens for a string such as: John & Co.

The above parser rule will return: John&Co.

Changing it to: { result = "#{val[0]} #{val[1]}"} will fix it, yet it
creates problems when given a string like: Bill's
As it will return: Bill ' s

I think I need another start condition enabled by WORD, where spaces
will not be ignored

:WORD \s+ { [:SPACE, text ] }

but I don't want to explicitly acknowledge a :SPACE token, I'd rather
just include it in text.
Actually the above start condition is ambiguous, how do I know when
the space is separating a token or part of a word?

Any ideas how I can keep punctuation and spaces as is when matching
the parser's name rule?

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.