Related articles |
---|
Scanner/Parser Whitespace Tokens: Ignore, Don't Ignore hsomob1999@yahoo.com (MaggotChild) (2009-05-31) |
From: | MaggotChild <hsomob1999@yahoo.com> |
Newsgroups: | comp.compilers |
Date: | Sun, 31 May 2009 19:26:43 -0700 (PDT) |
Organization: | Compilers Central |
Keywords: | lex, question |
Posted-Date: | 01 Jun 2009 10:22:45 EDT |
Hello,
I have a problem with my scanner and, in turn, my parser.
They're written in Rex and Racc (Ruby Lex & Yacc), the syntax is
somewhat similar to Lex and Yacc.
Here are problematic parts (":STATE" means in start condition :STATE
and [ :SYMBOL, text ] returns an array containing the matched token
and its value).
The scanner:
macro
QUOTE "
WORD \w+
PUNCT [&!,:'`]
rule
{PUNCT} { [ :PUNCT, text ] }
{QUOTE} { state = :STRING; [ :QUOTE, text ] }
:STRING [^{QUOTE}]+(?={QUOTE}) { [:WORD, text ] }
:STRING {QUOTE} { state = nil; [:QUOTE, text ] }
{WORD} { [ :WORD, text ] }
\s+
And the relevant part of the parser file:
name: WORD
| name PUNCT { result = "#{val[0]}#{val[1]}" }
| QUOTE name QUOTE { result = val[1] }
The scanner ignores spaces, though this causes problems for my parser
when presented with tokens for a string such as: John & Co.
The above parser rule will return: John&Co.
Changing it to: { result = "#{val[0]} #{val[1]}"} will fix it, yet it
creates problems when given a string like: Bill's
As it will return: Bill ' s
I think I need another start condition enabled by WORD, where spaces
will not be ignored
:WORD \s+ { [:SPACE, text ] }
\s+
but I don't want to explicitly acknowledge a :SPACE token, I'd rather
just include it in text.
Actually the above start condition is ambiguous, how do I know when
the space is separating a token or part of a word?
Any ideas how I can keep punctuation and spaces as is when matching
the parser's name rule?
Return to the
comp.compilers page.
Search the
comp.compilers archives again.