Scanner/Parser Whitespace Tokens: Ignore, Don't Ignore

MaggotChild <hsomob1999@yahoo.com>
Sun, 31 May 2009 19:26:43 -0700 (PDT)

From comp.compilers

Related articles
*Scanner/Parser Whitespace Tokens: Ignore, Don't Ignore hsomob1999@yahoo.com (MaggotChild)* (2009-05-31)**

| List of all articles for this month |

From:	MaggotChild <hsomob1999@yahoo.com>
Newsgroups:	comp.compilers
Date:	Sun, 31 May 2009 19:26:43 -0700 (PDT)
Organization:	Compilers Central
Keywords:	lex, question
Posted-Date:	01 Jun 2009 10:22:45 EDT

Hello,

I have a problem with my scanner and, in turn, my parser.

They're written in Rex and Racc (Ruby Lex & Yacc), the syntax is
somewhat similar to Lex and Yacc.

Here are problematic parts (":STATE" means in start condition :STATE
and [ :SYMBOL, text ] returns an array containing the matched token
and its value).

The scanner:

macro
      QUOTE "
      WORD \w+
      PUNCT [&!,:'`]

rule
      {PUNCT} { [ :PUNCT, text ] }
      {QUOTE} { state = :STRING; [ :QUOTE, text ] }
:STRING [^{QUOTE}]+(?={QUOTE}) { [:WORD, text ] }
:STRING {QUOTE} { state = nil; [:QUOTE, text ] }
      {WORD} { [ :WORD, text ] }
      \s+

And the relevant part of the parser file:

      name: WORD
| name PUNCT { result = "#{val[0]}#{val[1]}" }
   | QUOTE name QUOTE { result = val[1] }

The scanner ignores spaces, though this causes problems for my parser
when presented with tokens for a string such as: John & Co.

The above parser rule will return: John&Co.

Changing it to: { result = "#{val[0]} #{val[1]}"} will fix it, yet it
creates problems when given a string like: Bill's
As it will return: Bill ' s

I think I need another start condition enabled by WORD, where spaces
will not be ignored

:WORD \s+ { [:SPACE, text ] }
\s+

but I don't want to explicitly acknowledge a :SPACE token, I'd rather
just include it in text.
Actually the above start condition is ambiguous, how do I know when
the space is separating a token or part of a word?

Any ideas how I can keep punctuation and spaces as is when matching
the parser's name rule?

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Scanner/Parser Whitespace Tokens: Ignore, Don't Ignore

MaggotChild <hsomob1999@yahoo.com>Sun, 31 May 2009 19:26:43 -0700 (PDT)

MaggotChild <hsomob1999@yahoo.com>
Sun, 31 May 2009 19:26:43 -0700 (PDT)