Re: DFA's and LR machine (was Re: Trouble understanding...)

Chris F Clark <cfc@shell01.TheWorld.com>
9 May 2004 22:27:07 -0400

From comp.compilers

Related articles
[3 earlier articles]
Re: Trouble understanding LL(k) vs. LALR(k) etc. cfc@shell01.TheWorld.com (Chris F Clark) (2004-03-26)
Re: Trouble understanding LL(k) vs. LALR(k) etc. clint@0lsen.net (Clint Olsen) (2004-04-14)
Re: Trouble understanding LL(k) vs. LALR(k) etc. cfc@shell01.TheWorld.com (Chris F Clark) (2004-04-15)
DFA's and LR machine (was Re: Trouble understanding...) cdc@maxnet.co.nz (Carl Cerecke) (2004-04-21)
Re: DFA's and LR machine (was Re: Trouble understanding...) wyrmwif@tsoft.com (SM Ryan) (2004-04-28)
Re: DFA's and LR machine (was Re: Trouble understanding...) cdc@maxnet.co.nz (Carl Cerecke) (2004-05-08)
*Re: DFA's and LR machine (was Re: Trouble understanding...) cfc@shell01.TheWorld.com (Chris F Clark)* (2004-05-09)**
Re: DFA's and LR machine (was Re: Trouble understanding...) wyrmwif@tsoft.com (SM Ryan) (2004-05-16)

| List of all articles for this month |

From:	Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups:	comp.compilers
Date:	9 May 2004 22:27:07 -0400
Organization:	The World Public Access UNIX, Brookline, MA
References:	04-03-042 04-03-057 04-03-072 04-03-091 04-04-029 04-04-049 04-04-058 04-05-030
Keywords:	parse
Posted-Date:	09 May 2004 22:27:07 EDT

Carl Cereke wrote in a follow up to a question he asked about parsing
with dfa's + counters:

> Ok. Had I engaged my brain before my fingers, I would have remembered
> that, for grammars that include constructs like:
>
> S -> '(' S ')' | '[' S ']' | z

Actually, that is not a problem, you simply at each step multiply your
counter by 3 and add 1 for [ or 2 for (.

0 -> z
1 -> ( z )
2 -> [ z ]
3 unused
4 -> (( z ))
5 -> ([ z ])
6 unused
7 -> [( z ])
8 -> [[ z ]]
9-12 unused
13 -> ((( z )))
...

One could find an even better encoding quite easily. Of course, all
I've done is translate the stack into integer operations, but that's
the idea behind counters anyway. I'm sure there some nice proof by
Church, Turing, Goedel, Post or someone that shows that one can do
this completely generally--my automata theory is quite rusty these
days.

In any case, so the answer is one can implement any LR parser, by a
DFA plus counters (if you allow sufficient integer arithmetic). With
a little more work I think one can even do an Earley style parser that
way. In fact, if a DFA + infinite (unbounded) integer arithmetic were
not Turing Complete I would be extremely surprised.

------------------------------------------------------------------------

Taking your thought another direction, if you shift by 2
(i.e. multiply by 4) rather than multiply by 3, it becomes pretty
clear that the integer represents a stack with 2 bits per entry. This
reminds me of a wonderful old paper in the British Computing Society's
Journal on using parsing to compress. The author worked out a way to
encode the parsing decisions of both LL and LR parsers as integers
(bit vectors), using the minimal space. The idea is quite obvious in
retrospective, at each point in the parse, you have a series of
decisions to make based on the input, and you encode those decisions
according to their probabilities, very much like arithmetic encoding
(actually, I think it is exactly like arithmetic encoding, although
the author may have used a simpler scheme like Huffman encoding).

Note, your original idea of using a simple counter to handle pure
left or right recursion is equivalent to using run-length encoding for
that part of the grammar. However, I think if you do the arithmetic
encoding right, you will be at most a couple bits longer than the
run-length encoding for those cases. (My compression theory is no
better than my automata theory, so you'ld be better off asking someone
else.)

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: DFA's and LR machine (was Re: Trouble understanding...)

Chris F Clark <cfc@shell01.TheWorld.com>9 May 2004 22:27:07 -0400

Chris F Clark <cfc@shell01.TheWorld.com>
9 May 2004 22:27:07 -0400