Re: Full LR(1) parser generator Hyacc 0.9 release

"Paul B Mann" <paul@paulbmann.com>
Thu, 28 Feb 2008 21:06:42 -0600

          From comp.compilers

Related articles
[13 earlier articles]
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark) (2008-02-27)
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-27)
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-27)
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark) (2008-02-28)
Re: Full LR(1) parser generator Hyacc 0.9 release cfc@shell01.TheWorld.com (Chris F Clark) (2008-02-28)
Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-28)
Re: Full LR(1) parser generator Hyacc 0.9 release paul@paulbmann.com (Paul B Mann) (2008-02-28)
| List of all articles for this month |

From: "Paul B Mann" <paul@paulbmann.com>
Newsgroups: comp.compilers
Date: Thu, 28 Feb 2008 21:06:42 -0600
Organization: Compilers Central
References: 08-02-019 08-02-022 08-02-030 08-02-037 08-02-040 08-02-041 08-02-044 <Pine.SOC.4.61.0802230000570.9914@unixlab03.ces.clemson.edu> 08-02-076 08-02-084 08-02-087 08-02-096
Keywords: LR(1)
Posted-Date: 29 Feb 2008 01:22:49 EST

> Paul, since you've joined this conversation and I know you implemented
> state splitting in LRgen. Is the assertion that state splitting will
> never completely resolve a S/R conflict correct (i.e. that there will
> always be some state in the LR(1) machine that will have a S/R
> conflict if the LR(0) machine has an S/R conflict? Even when I'm
> certain of my logic, I prefer when someone I respect confirms it.


I was not sure if anybody was reading my messages until now. I
thought I was intruding.


The short answer is:


YES you are correct. You cannot remove the S/R conflicts by state
splitting. And canonical LR(1) construction with state merging gives
the same results.


Here is the long answer:


I did not implement state splitting, actually. I implemented canonical
LR(1) construction with state merging as the states are being generated
(to keep the memory usage as small as possible).


For many years I was curious to find out just how large the canonical
LR(1) finite-state machine would be for large grammars. So, I
implemented canonical LR(1) first and looked at the results.


Here are the numbers of canonical LR(1) states for some grammars:


Ada ... 15,568 states.
ANSI-C ... 4,984 states.
C++ ... 6,239 states.
COBOL ... 2,000,000 + states, (ran out of memory, 512 MB),
DB2 ... 131,568 states.
Lotus Script ... 12,982 states.
Oracle SQL ... 92,619 states.
PL/I ... 4,933 states.
XPL ... 244 states.


The DB2 parser tables were over 3 MB in size, so I knew that
would never work. I wish somebody else would do this research
to verify my results. I will gladly donate the grammars, if someone
wants to try it, except for 2 that are copyrighted.


So I decided to try state merging and that works very well, giving
a number of states exactly the same as LALR(1) construction for
those grammars that are LALR(1). For those grammars that are
not LALR(1), the number of states is slightly larger, 1% or 2%
maybe.


Now to answer your question. I think Joel Denny has already
answered it, but I will give my 2 cents.


The S/R conflicts will remain in the state machine no matter how
you split the states !!! The only thing that happens is that you
split up the look-ahead symbols for the S/R conflict, putting them
in different states.


As I said in my older message, if you have merged two states and
you get the S/R conflict on both 't' and 'n' look-aheads, keeping
the states separate only separates the look-aheads, putting 't' in one
state and 'n' in another state.


You will NOT gain any recognition power by splitting these states.


However, you may gain a slight advantage, if you are implementing
some kind of non-determinism in the parser algorithm for these
situations where the language is LR(2), LR(3) or greater.


It means that you don't have to try as many possibilities, a time
savings perhaps. I think that is all you get.


I added an option which I call XLR, for extra LR states, for those
people who want to experiment with this. XLR will cause the
parser generator to avoid merging states where extra S/R conflicts
would appear by the merger.


I have not used this XLR option, yet. I don't see much use in it --
just an academic exercise.


If I made an error, please let me know.
Hope this helps.


Paul B Mann
http://lrgen.com


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.