Re: Parsing Expression Grammar

Cleo Saulnier <cleos@nb.sympatico.ca>
17 Sep 2005 13:54:35 -0400

          From comp.compilers

Related articles
[11 earlier articles]
Re: Parsing Expression Grammar DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-09-10)
Re: Parsing Expression Grammar paul@parsetec.com (Paul Mann) (2005-09-11)
Re: Parsing Expression Grammar paul@parsetec.com (Paul Mann) (2005-09-11)
Re: Parsing Expression Grammar DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-09-14)
Re: Parsing Expression Grammar gneuner2@comcast.net (George Neuner) (2005-09-14)
Re: Parsing Expression Grammar Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-09-14)
Re: Parsing Expression Grammar cleos@nb.sympatico.ca (Cleo Saulnier) (2005-09-17)
Re: Parsing Expression Grammar DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-09-17)
Re: Parsing Expression Grammar Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-09-18)
Re: Parsing Expression Grammar cleos@nb.sympatico.ca (Cleo Saulnier) (2005-09-22)
Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22)
generated code Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-09-22)
Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22)
[16 later articles]
| List of all articles for this month |
From: Cleo Saulnier <cleos@nb.sympatico.ca>
Newsgroups: comp.compilers
Date: 17 Sep 2005 13:54:35 -0400
Organization: Aliant Internet
References: 05-09-023 05-09-045 05-09-058
Keywords: parse
Posted-Date: 17 Sep 2005 13:54:35 EDT

Detlef Meyer-Eltz wrote:
>>"Paul Mann" <paul@highpersoft.com> wrote
>>
>>>"Chris F Clark" <cfc@shell01.TheWorld.com> wrote in message
>>>
>>>>But, as a potential language designer, you need to think long and hard
>>>>about why you want to create a language which has no LL(1) grammar.
>>>
>>>Answer: Readability for human's sake.


>
>>First of all, readability in the grammar. Perhaps that is what I was
>>thinking about, rather than readability in a programming language.
>>But I might be able to find some cases where LALR(1) allows greater
>>readability in the language being parsed also.
>
>
[snip]
> I am interested to know, how the code for the different LR-Parsers
> would look like. I only know yacc/bison code which for my feeling is
> quite awful.


First, I think an LR parser generator should always be preferred over
SLR or LALR. The tables are gonna be close to the same size if you
compress them correctly, but with the added advantage that LR can
handle more grammars without rewriting them which will expand the
table anyhow.


    And I find that GLR is a bit of overkill and you're most likely not
using your lexer like you should, or you're overly relying on the
parser instead of making simple changes to your lexer that would
greatly speed up the parser anyhow.


Just a reminder:
SLR - no lookahead on reduce.
LALR - all lookaheads merged into one state for reduce.
LR - state splitting for each lookahead if necessary, but shouldn't be
much worse than LALR when state merging is possible, although this can
be difficult to detect and accomplish.


LR will create more states than LALR, but there's usually a good reason
for that. It differentiates between different states depending on the
lookahead. This is important for not getting incorrect parses. If the
states can "ideally" be merged, then most likely the grammar isn't
written in the most optimal fashion.


I find LL the easiest to understand and the easiest to follow the
execution of action routines. Unfortunately, I have found that the
generated code is highly dependant on the productions and this actually
leads to a bit of bloat and prone to spaghetti code. Not only that, but
the grammar contortions that are often needed tend to be very ugly.
  From the two LL parser generators that I wrote, this is what annoyed me
the most. Second came the size of the actual parsing routines. From a
superficial point of view (one who writes grammars... and beginners), LL
is great. From an implementor's point of view, it lacks finesse and
seems to carry along a lot of baggage.


For me, SLR and LALR are a waste of time. LALR jumps through way too
many hoops to not compute all the states that LR does. LR can simply
merge states when it's appropriate and get close to the same table sizes
with the benefit of handling more special cases. Yes, yes, LR will
create more states, but not necessarilly that many more entries once
compressed.


  From a superficial (one who writes grammars) point of view, LR takes a
bit of getting used to, especially for action routines as they can
realistically only be done during reduction, but that's not a real
obstacle. One of the main issues I had was that I could write a startup
action routine in the head production to initialise things in LL. With
LR, you can't do that because that's the last production to be reduced.


  From an implementor's point of view (someone who's written a canonical
LR parser generator), LR is nothing short of pure elegance. The way the
states collapse. The way the tables can be compressed. The way the
actual parsing routine can take so few lines of code. The ease at which
the error checking code can be implemented. It's simply a pure pleasure
to be a programmer.


So basically, if you're new to parsers or you don't know the intricacies
of LR parsing, I'm willing to bet you'll much prefer LL. I know I used
to. But once you fully understand LR, I can't imagine why you would use
anything else. I also find that LR code is much more maintainable...
cleaner code basically. The realiance on productions in LL even though
you can use a table, and the code to keep track of all that isn't my cup
of tea.


Just wanted to mention that I have written LL, SLR, LALR and a canonical
LR(1) parser generators. I've just released my LR(1) parser generator
on SF, but the docs are slowly coming out.


Cle


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.