Scanner/Parser Whitespace Tokens: Ignore, Don't Ignore

MaggotChild <hsomob1999@yahoo.com>
Sun, 31 May 2009 19:26:43 -0700 (PDT)

          From comp.compilers

Related articles
Scanner/Parser Whitespace Tokens: Ignore, Don't Ignore hsomob1999@yahoo.com (MaggotChild) (2009-05-31)
| List of all articles for this month |

From: MaggotChild <hsomob1999@yahoo.com>
Newsgroups: comp.compilers
Date: Sun, 31 May 2009 19:26:43 -0700 (PDT)
Organization: Compilers Central
Keywords: lex, question
Posted-Date: 01 Jun 2009 10:22:45 EDT

Hello,


I have a problem with my scanner and, in turn, my parser.


They're written in Rex and Racc (Ruby Lex & Yacc), the syntax is
somewhat similar to Lex and Yacc.


Here are problematic parts (":STATE" means in start condition :STATE
and [ :SYMBOL, text ] returns an array containing the matched token
and its value).


The scanner:


macro
      QUOTE "
      WORD \w+
      PUNCT [&!,:'`]


rule
      {PUNCT} { [ :PUNCT, text ] }
      {QUOTE} { state = :STRING; [ :QUOTE, text ] }
:STRING [^{QUOTE}]+(?={QUOTE}) { [:WORD, text ] }
:STRING {QUOTE} { state = nil; [:QUOTE, text ] }
      {WORD} { [ :WORD, text ] }
      \s+




And the relevant part of the parser file:


      name: WORD
| name PUNCT { result = "#{val[0]}#{val[1]}" }
   | QUOTE name QUOTE { result = val[1] }






The scanner ignores spaces, though this causes problems for my parser
when presented with tokens for a string such as: John & Co.


The above parser rule will return: John&Co.


Changing it to: { result = "#{val[0]} #{val[1]}"} will fix it, yet it
creates problems when given a string like: Bill's
As it will return: Bill ' s


I think I need another start condition enabled by WORD, where spaces
will not be ignored


:WORD \s+ { [:SPACE, text ] }
\s+


but I don't want to explicitly acknowledge a :SPACE token, I'd rather
just include it in text.
Actually the above start condition is ambiguous, how do I know when
the space is separating a token or part of a word?


Any ideas how I can keep punctuation and spaces as is when matching
the parser's name rule?



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.