Related articles |
---|
Using LALR machine to disambiguate tokens chickenkungpao@hotmail.com (Monty Hall) (2004-09-07) |
From: | "Monty Hall" <chickenkungpao@hotmail.com> |
Newsgroups: | comp.compilers |
Date: | 7 Sep 2004 23:58:33 -0400 |
Organization: | SBC http://yahoo.sbc.com |
Keywords: | LALR |
Posted-Date: | 07 Sep 2004 23:58:33 EDT |
Just finished an LALR(k) dfa generator that also generates lexer regular
expression dfa in hopes of creating an integrated parse/lex rapid
development tool that's relatively 'hands free'. One thing that I am toying
with is disambiguating tokens. From the RE/grammar bnf snippet below:
string = [a-z]+
<start> ::= 'max' 'lookahead' '=' int
| 'start' 'rule' '=' int
| string '=' int
When tokens may assume only one accept symbol, I simply find it annoying
that max, lookahead, start, and rule, are in string's dfa. One common
solution that I've seen is:
<start> ::= string string
{ string[0] = 'max' && string[1] = 'lookahead' .....}
I was thinking of using the LALR(k) machine to disambiguate
tokens. It could be done by adding a bitmask to each LALR state for
allowable input and using lookahead if the bitmasking yields a truly
ambiguous token. Does anybody have information on the topic of token
disambiguation or parsing keywordless programming languages(pitfalls,
concerns & considerations) and if possible as it relates to a LR
machine?
Regards,
Monty
chickenkungpao@hotmail.com
Return to the
comp.compilers page.
Search the
comp.compilers archives again.