Re: 2 word token as one in lex

"Kevin Szabo" <kszabo@nortelnetworks.com>
23 Jul 2000 17:04:15 -0400

          From comp.compilers

Related articles
2 word token as one in lex makotosu@my-deja.com (2000-07-18)
Re: 2 word token as one in lex troy@bell-labs.com (Troy Cauble) (2000-07-23)
Re: 2 word token as one in lex james.d.carlson@sun.com (James Carlson) (2000-07-23)
Re: 2 word token as one in lex kszabo@nortelnetworks.com (Kevin Szabo) (2000-07-23)
| List of all articles for this month |

From: "Kevin Szabo" <kszabo@nortelnetworks.com>
Newsgroups: comp.compilers
Date: 23 Jul 2000 17:04:15 -0400
Organization: Nortel Networks (Ottawa, Ontario, Canada)
References: 00-07-034
Keywords: parse, comment

|I'm trying to parse SQL and I'd like to recognize UNION JOIN as one
|token in the lexer. So for example,
|
|if the lexer sees UNION and the next token (after any # of whitespaces,
|tabs and newlines) is JOIN it should return UNION_JOIN 


My personal feeling is that you are trying do too much work in the
lexer. Why not just specify this rule in your Yacc grammar? It will
save you aggravation if 'union join' is going to be part of some other
production.


John's suggestion of an intermediate pre-parser between the lexer and
the parser is a good one; I've exploited that for thing that are
manually generated (like recursive descent) but I usually like to
express all the rules in the grammar if at all possible.


Have you looked at the O'Reilly lex-yacc book? They have an SQL
parser as an example (If I remember correctly IIRC).


Kevin
[My SQL grammar in the book is for the old SQL 89. I dimly recall
there's some ambiguity in the grammar that makes it reasonable to
try to handle UNION JOIN as one token. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.