Using LALR machine to disambiguate tokens

"Monty Hall" <>
7 Sep 2004 23:58:33 -0400

          From comp.compilers

Related articles
Using LALR machine to disambiguate tokens (Monty Hall) (2004-09-07)
| List of all articles for this month |

From: "Monty Hall" <>
Newsgroups: comp.compilers
Date: 7 Sep 2004 23:58:33 -0400
Organization: SBC
Keywords: LALR
Posted-Date: 07 Sep 2004 23:58:33 EDT

        Just finished an LALR(k) dfa generator that also generates lexer regular
expression dfa in hopes of creating an integrated parse/lex rapid
development tool that's relatively 'hands free'. One thing that I am toying
with is disambiguating tokens. From the RE/grammar bnf snippet below:

    string = [a-z]+
    <start> ::= 'max' 'lookahead' '=' int
                            | 'start' 'rule' '=' int
                            | string '=' int

        When tokens may assume only one accept symbol, I simply find it annoying
that max, lookahead, start, and rule, are in string's dfa. One common
solution that I've seen is:

    <start> ::= string string
            { string[0] = 'max' && string[1] = 'lookahead' .....}

        I was thinking of using the LALR(k) machine to disambiguate
tokens. It could be done by adding a bitmask to each LALR state for
allowable input and using lookahead if the bitmasking yields a truly
ambiguous token. Does anybody have information on the topic of token
disambiguation or parsing keywordless programming languages(pitfalls,
concerns & considerations) and if possible as it relates to a LR



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.