Thu, 28 Feb 2008 21:06:42 -0600

Related articles |
---|

[13 earlier articles] |

Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark) (2008-02-27) |

Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-27) |

Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-27) |

Re: Full LR(1) parser generator Hyacc 0.9 release cfc@TheWorld.com (Chris F Clark) (2008-02-28) |

Re: Full LR(1) parser generator Hyacc 0.9 release cfc@shell01.TheWorld.com (Chris F Clark) (2008-02-28) |

Re: Full LR(1) parser generator Hyacc 0.9 release jdenny@ces.clemson.edu (Joel E. Denny) (2008-02-28) |

Re: Full LR(1) parser generator Hyacc 0.9 release paul@paulbmann.com (Paul B Mann) (2008-02-28) |

From: | "Paul B Mann" <paul@paulbmann.com> |

Newsgroups: | comp.compilers |

Date: | Thu, 28 Feb 2008 21:06:42 -0600 |

Organization: | Compilers Central |

References: | 08-02-019 08-02-022 08-02-030 08-02-037 08-02-040 08-02-041 08-02-044 <Pine.SOC.4.61.0802230000570.9914@unixlab03.ces.clemson.edu> 08-02-076 08-02-084 08-02-087 08-02-096 |

Keywords: | LR(1) |

Posted-Date: | 29 Feb 2008 01:22:49 EST |

*> Paul, since you've joined this conversation and I know you implemented*

*> state splitting in LRgen. Is the assertion that state splitting will*

*> never completely resolve a S/R conflict correct (i.e. that there will*

*> always be some state in the LR(1) machine that will have a S/R*

*> conflict if the LR(0) machine has an S/R conflict? Even when I'm*

*> certain of my logic, I prefer when someone I respect confirms it.*

I was not sure if anybody was reading my messages until now. I

thought I was intruding.

The short answer is:

YES you are correct. You cannot remove the S/R conflicts by state

splitting. And canonical LR(1) construction with state merging gives

the same results.

Here is the long answer:

I did not implement state splitting, actually. I implemented canonical

LR(1) construction with state merging as the states are being generated

(to keep the memory usage as small as possible).

For many years I was curious to find out just how large the canonical

LR(1) finite-state machine would be for large grammars. So, I

implemented canonical LR(1) first and looked at the results.

Here are the numbers of canonical LR(1) states for some grammars:

Ada ... 15,568 states.

ANSI-C ... 4,984 states.

C++ ... 6,239 states.

COBOL ... 2,000,000 + states, (ran out of memory, 512 MB),

DB2 ... 131,568 states.

Lotus Script ... 12,982 states.

Oracle SQL ... 92,619 states.

PL/I ... 4,933 states.

XPL ... 244 states.

The DB2 parser tables were over 3 MB in size, so I knew that

would never work. I wish somebody else would do this research

to verify my results. I will gladly donate the grammars, if someone

wants to try it, except for 2 that are copyrighted.

So I decided to try state merging and that works very well, giving

a number of states exactly the same as LALR(1) construction for

those grammars that are LALR(1). For those grammars that are

not LALR(1), the number of states is slightly larger, 1% or 2%

maybe.

Now to answer your question. I think Joel Denny has already

answered it, but I will give my 2 cents.

The S/R conflicts will remain in the state machine no matter how

you split the states !!! The only thing that happens is that you

split up the look-ahead symbols for the S/R conflict, putting them

in different states.

As I said in my older message, if you have merged two states and

you get the S/R conflict on both 't' and 'n' look-aheads, keeping

the states separate only separates the look-aheads, putting 't' in one

state and 'n' in another state.

You will NOT gain any recognition power by splitting these states.

However, you may gain a slight advantage, if you are implementing

some kind of non-determinism in the parser algorithm for these

situations where the language is LR(2), LR(3) or greater.

It means that you don't have to try as many possibilities, a time

savings perhaps. I think that is all you get.

I added an option which I call XLR, for extra LR states, for those

people who want to experiment with this. XLR will cause the

parser generator to avoid merging states where extra S/R conflicts

would appear by the merger.

I have not used this XLR option, yet. I don't see much use in it --

just an academic exercise.

If I made an error, please let me know.

Hope this helps.

Paul B Mann

http://lrgen.com

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.