|Lemon, fallback, and vacuous tokens. email@example.com (qarnos) (2011-01-02)|
|Re: Lemon, fallback, and vacuous tokens. firstname.lastname@example.org (glen herrmannsfeldt) (2011-01-03)|
|Re: Lemon, fallback, and vacuous tokens. email@example.com (2011-01-05)|
|Re: Lemon, fallback, and vacuous tokens. cfc@shell01.TheWorld.com (Chris F Clark) (2011-01-14)|
|Date:||Sun, 2 Jan 2011 03:09:31 -0800 (PST)|
|Keywords:||parse, LALR, question|
|Posted-Date:||02 Jan 2011 20:40:13 EST|
Looking for an interesting challenge, I decided to try my hand at
writing an LALR(1) parser generator.
The old DOS version of Microsoft QuickBASIC 4.5, to be precise.
It's heavily based on Lemon, using the same input syntax and
approximately the same table compression algorithm. It's slowly
getting there. The program has been boot-strapped to used it's own
During initial development, I was peeking into the innards of Lemon,
and I noticed an undocumented feature called "fallback tokens", which
I promptly implemented in my program, tentatively named "Quack".
If you are not familiar with fallback tokens, the idea is fairly
simple: The grammar specification can define an input symbol as being
a "fallback" for another input symbol if the parse should fail. If I
declare "IDENTIFIER" to be a fallback for "KEYWORD", and a construct
can not be parsed using "KEYWORD", Lemon will substitute "IDENTIFIER"
and re-try the parse before throwing an error.
This got me thinking about a method for handling common error cases,
such as missing semi-colons in a C program, without sprinkling "error"
tokens throughout the grammar, which I find incredibly tedious. Why
can programmers just not make mistakes!
This has lead to an idea which I am calling "vacuous tokens". I am not
active enough in the compiler community to know if this is an original
idea or, indeed, even a good one, so I thought I'd post it here and
The vacuous token concept works as follows:
1. In the grammar specification file, a token can be declared as
vacuous. The vacuous token might apply to all the potential input
symbols, or perhaps only a subset.
2. If an input symbol can not be parsed, and it has a vacuous token
associated with it, the following steps are taken:
2.1. If the vacuous token can be shifted, and the resulting state
permits an action on the original input symbol, the vacuous token is
accepted and the parse resumes.
2.2. If the vacuous token results in a reduce action, the parse stack
is inspected for the resulting state. We repeat this process until we
can satisfy 2.1, or we find an error.
2.3. If the vacuous token resolves the error (at least for the time
being), we may execute a block of code associated with the vacuous
token so the client can report the error or take other action.
2.4. If the vacuous token does not resolve the error, the normal error
recovery strategy is invoked using the input symbol which caused the
3. If no vacuous token is specified, normal error recovery takes
I would be very interested to know if there are any flaws in this
approach. It seems like a "too good to be true" kind of idea and, in
my experience, when I get that feeling, I am usually right!
Any comments/criticism/advice appreciated.
Return to the
Search the comp.compilers archives again.