Re: DFA's and LR machine (was Re: Trouble understanding...)

Carl Cerecke <cdc@maxnet.co.nz>
8 May 2004 21:11:04 -0400

          From comp.compilers

Related articles
[2 earlier articles]
Re: Trouble understanding LL(k) vs. LALR(k) etc. cdc@maxnet.co.nz (Carl Cerecke) (2004-03-19)
Re: Trouble understanding LL(k) vs. LALR(k) etc. cfc@shell01.TheWorld.com (Chris F Clark) (2004-03-26)
Re: Trouble understanding LL(k) vs. LALR(k) etc. clint@0lsen.net (Clint Olsen) (2004-04-14)
Re: Trouble understanding LL(k) vs. LALR(k) etc. cfc@shell01.TheWorld.com (Chris F Clark) (2004-04-15)
DFA's and LR machine (was Re: Trouble understanding...) cdc@maxnet.co.nz (Carl Cerecke) (2004-04-21)
Re: DFA's and LR machine (was Re: Trouble understanding...) wyrmwif@tsoft.com (SM Ryan) (2004-04-28)
Re: DFA's and LR machine (was Re: Trouble understanding...) cdc@maxnet.co.nz (Carl Cerecke) (2004-05-08)
Re: DFA's and LR machine (was Re: Trouble understanding...) cfc@shell01.TheWorld.com (Chris F Clark) (2004-05-09)
Re: DFA's and LR machine (was Re: Trouble understanding...) wyrmwif@tsoft.com (SM Ryan) (2004-05-16)
| List of all articles for this month |
From: Carl Cerecke <cdc@maxnet.co.nz>
Newsgroups: comp.compilers
Date: 8 May 2004 21:11:04 -0400
Organization: TelstraClear
References: 04-03-042 04-03-057 04-03-072 04-03-091 04-04-029 04-04-049 04-04-058
Keywords: parse
Posted-Date: 08 May 2004 21:11:04 EDT

Carl Cerecke wrote:


> The core idea is to translate the DPDA generated by, say, LALR(1) into
> a DFA-with-counters. How? A stack in the DPDA completely describes the
> state of the parser. If we enumerate all possible stacks, then each
> possible stack becomes a state in the DFA. Unfortunately, of course,
> stacks may be infinitely long. We can get around that by noticing that
> longer stacks will always have repeated parts - where loops appear in
> the DPDA. Instead of repeating these parts in the stack, we can just
> have a counter for each possible repeated part. Now our set of
> possible stacks is finite - it consists of all stacks that contain no
> repeating parts. Each possible non-repeating stack is now a node in
> the DFA. A shift-edge in the DPDA becomes an edge in the DFA that
> might optionally increment a repeat-counter. A reduction-edge becomes
> one or more edges in the DFA, depending on the repeat-counters, and
> the edge may have a decrement-counter operation attached to
> it. Regular expressions would fit neatly into this model, as they are
> simply DFAs with no counters.


OK. Had I engaged my brain before my fingers, I would have remembered
that, for grammars that include constructs like:


S -> '(' S ')' | '[' S ']' | z


a stack is *required*, because an entire sequence of ( and [ must be
remembered in order to see the matching token on the other side of the
z. If we only use counters to count how many we've seen, then we will
know how many of ) and ] we expect, but we won't know in what order
they are required. This could be handled with backtracking, or parsing
multiple possibilities at once. Pretty ugly, and likely to be
inefficient, of course. Lookahead can be used, but a finite lookahead
will end up accepting strings not in the language. i.e. LA of 1 would,
for the prefix ([([(z) accept any combination of two ) and two ].


So, a stack is basically required. But what is minimally required to
go onto the stack? I think a stack is required only for loops arising
from mutually nestable middle-recursive rules: when you encounter one
of these loops, you push an appropriate symbol, and at the appropriate
reduction you pop off the symbol and use it to decide where to go. All
other loops can increment a simple counter, and the appropriate
reduction decrements it (Or, for suitably masochistic grammars, a
multiple decrement may be required: S->aSb|aaSc|a)


And so, it should be possible to convert nearly[1] any LR parser to a
mostly-DFA with some counters and the odd push and pop. Integrating
regular expressions into that should be simple. (See! There is a point)


Cheers,
Carl.


[1] Anybody who creates ugliness like:
S-> acSc | acacSf | abSd | z
deserves a full-blown LR machine, I think.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.