Related articles |
---|
[13 earlier articles] |
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark) (2008-02-27) |
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-27) |
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-27) |
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark) (2008-02-28) |
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@shell01.TheWorld.com (Chris F Clark) (2008-02-28) |
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-28) |
Re: Full LR(1) parser generator Hyacc 0.9 release paul@paulbmann.com (Paul B Mann) (2008-02-28) |
From: | "Paul B Mann" <paul@paulbmann.com> |
Newsgroups: | comp.compilers |
Date: | Thu, 28 Feb 2008 21:06:42 -0600 |
Organization: | Compilers Central |
References: | 08-02-019 08-02-022 08-02-030 08-02-037 08-02-040 08-02-041 08-02-044 <Pine.SOC.4.61.0802230000570.9914@unixlab03.ces.clemson.edu> 08-02-076 08-02-084 08-02-087 08-02-096 |
Keywords: | LR(1) |
Posted-Date: | 29 Feb 2008 01:22:49 EST |
> Paul, since you've joined this conversation and I know you implemented
> state splitting in LRgen. Is the assertion that state splitting will
> never completely resolve a S/R conflict correct (i.e. that there will
> always be some state in the LR(1) machine that will have a S/R
> conflict if the LR(0) machine has an S/R conflict? Even when I'm
> certain of my logic, I prefer when someone I respect confirms it.
I was not sure if anybody was reading my messages until now. I
thought I was intruding.
The short answer is:
YES you are correct. You cannot remove the S/R conflicts by state
splitting. And canonical LR(1) construction with state merging gives
the same results.
Here is the long answer:
I did not implement state splitting, actually. I implemented canonical
LR(1) construction with state merging as the states are being generated
(to keep the memory usage as small as possible).
For many years I was curious to find out just how large the canonical
LR(1) finite-state machine would be for large grammars. So, I
implemented canonical LR(1) first and looked at the results.
Here are the numbers of canonical LR(1) states for some grammars:
Ada ... 15,568 states.
ANSI-C ... 4,984 states.
C++ ... 6,239 states.
COBOL ... 2,000,000 + states, (ran out of memory, 512 MB),
DB2 ... 131,568 states.
Lotus Script ... 12,982 states.
Oracle SQL ... 92,619 states.
PL/I ... 4,933 states.
XPL ... 244 states.
The DB2 parser tables were over 3 MB in size, so I knew that
would never work. I wish somebody else would do this research
to verify my results. I will gladly donate the grammars, if someone
wants to try it, except for 2 that are copyrighted.
So I decided to try state merging and that works very well, giving
a number of states exactly the same as LALR(1) construction for
those grammars that are LALR(1). For those grammars that are
not LALR(1), the number of states is slightly larger, 1% or 2%
maybe.
Now to answer your question. I think Joel Denny has already
answered it, but I will give my 2 cents.
The S/R conflicts will remain in the state machine no matter how
you split the states !!! The only thing that happens is that you
split up the look-ahead symbols for the S/R conflict, putting them
in different states.
As I said in my older message, if you have merged two states and
you get the S/R conflict on both 't' and 'n' look-aheads, keeping
the states separate only separates the look-aheads, putting 't' in one
state and 'n' in another state.
You will NOT gain any recognition power by splitting these states.
However, you may gain a slight advantage, if you are implementing
some kind of non-determinism in the parser algorithm for these
situations where the language is LR(2), LR(3) or greater.
It means that you don't have to try as many possibilities, a time
savings perhaps. I think that is all you get.
I added an option which I call XLR, for extra LR states, for those
people who want to experiment with this. XLR will cause the
parser generator to avoid merging states where extra S/R conflicts
would appear by the merger.
I have not used this XLR option, yet. I don't see much use in it --
just an academic exercise.
If I made an error, please let me know.
Hope this helps.
Paul B Mann
http://lrgen.com
Return to the
comp.compilers page.
Search the
comp.compilers archives again.