2 word token as one in lex

makotosu@my-deja.com
18 Jul 2000 13:11:22 -0400

          From comp.compilers

Related articles
2 word token as one in lex makotosu@my-deja.com (2000-07-18)
Re: 2 word token as one in lex troy@bell-labs.com (Troy Cauble) (2000-07-23)
Re: 2 word token as one in lex james.d.carlson@sun.com (James Carlson) (2000-07-23)
Re: 2 word token as one in lex kszabo@nortelnetworks.com (Kevin Szabo) (2000-07-23)
| List of all articles for this month |

From: makotosu@my-deja.com
Newsgroups: comp.compilers
Date: 18 Jul 2000 13:11:22 -0400
Organization: Compilers Central
Keywords: question, comment

Hi,


I'm trying to parse SQL and I'd like to recognize UNION JOIN as one
token in the lexer. So for example,


if the lexer sees UNION and the next token (after any # of whitespaces,
tabs and newlines) is JOIN it should return UNION_JOIN 


but, if the lexer sees UNION and the next token is anything else, then
it should return UNION (for example UNION ALL, would return the token
UNION and *then* the token ALL).


I want to do this in the lexcial analyzer, not the parser. Is this
possible? I was thinking of using exclusive states but I could not get
it working, I did something like


%x JOIN_CHECK


...


UNION { BEGIN JOIN_CHECK; }
<JOIN_CHECK>JOIN < BEGIN INITIAL; return UNION_JOIN;>


but I'm not sure what else to do? Any help would be greatly
appreciated.


Thank you,


Norman Su
makotosu@deja.com
[You can more or less fake it via "UNION\s+JOIN" but doing it right is
a pain in the neck. Consider adding an intermediate level between the
lexer and parser that does token mutation. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.