Re: Parsing Expression Grammar

Russ Cox <rsc@swtch.com>
22 Sep 2005 23:49:16 -0400

          From comp.compilers

Related articles
[17 earlier articles]
Re: Parsing Expression Grammar cleos@nb.sympatico.ca (Cleo Saulnier) (2005-09-17)
Re: Parsing Expression Grammar DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-09-17)
Re: Parsing Expression Grammar Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-09-18)
Re: Parsing Expression Grammar cleos@nb.sympatico.ca (Cleo Saulnier) (2005-09-22)
Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22)
Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22)
Re: Parsing Expression Grammar rsc@swtch.com (Russ Cox) (2005-09-22)
Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-09-22)
Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22)
Re: Parsing Expression Grammar paul@parsetec.com (Paul Mann) (2005-09-22)
Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-09-23)
Re: Parsing Expression Grammar gneuner2@comcast.net (George Neuner) (2005-09-25)
Re: Parsing Expression Grammar cfc@world.std.com (Chris F Clark) (2005-09-25)
[4 later articles]
| List of all articles for this month |
From: Russ Cox <rsc@swtch.com>
Newsgroups: comp.compilers
Date: 22 Sep 2005 23:49:16 -0400
Organization: Compilers Central
References: <432FE722.1070801@i3s.unice.fr>
Keywords: parse
Posted-Date: 22 Sep 2005 23:49:15 EDT

Forwarded with permission.


From: Bryan Ford <baford@mit.edu>
Date: Sep 20, 2005 5:43 PM
Subject: Re: Fwd: Parsing Expression Grammar


> All this thread on PEGs has convinced me of giving them a better look.
> But something isn't quite clear; in his ICFP'03 paper, Bryan Ford
> presented an unambiguous CFG that could not be recognized by his packrat
> parser:
>
> P -> x P x | x
>
> (this CFG is especially interesting for it has a quadratic parsing time
> complexity using an Earley parser).
> I have tried "P <- x P x / x" to see what would happen, and indeed,
> the packrat parser is only able to parse sequences of 2^n - 1 "x"
> symbols, n > 0.
>
> This problem does not seem to be addressed in his POPL'04 paper; the
> well-formed definition, if I understand it correctly, is only concerned
> about left recursion and does not reject the above grammar.


The fact that it's possible to transform most CFGs into a syntactically
correct PEG by replacing "->" with "<-" and "|" with "/" does not by any
means imply that the PEG so generated will actually represent the same
language as the CFG did. CFGs and PEGs are different syntactic
meta-languages whose expressions have substantially different
interpretations, and I've never yet even tried to define a "direct" formal
relationship directly between similar-looking PEGs and CFGs. In other words,
the fact that the CFG "P -> x P x | x" looks a lot like the PEG "P <- x P x /
x" means absolutely nothing formally.


It's probably possible to define some such formal correspondence between a
very restrictive subset of CFGs, and a similar very restrictive subset of
PEGs, and prove that the corresponding grammars represent the same language.
The LL(0) CFGs might be a good starting point for building such a
correspondence. I haven't done so yet, however, nor has anyone else that I
know of. Needless to say, the CFG "P -> x P x | x" would probably not be
covered by this correspondence.


> Now, strictly speaking, the language recognized by "P <- x P x / x"
> *is* {x^(2^n - 1) | n > 0} (and that's quite an achievement in itself),


Actually, if you analyze the behavior of the above PEG further I think you'll
find that it represents the language {"x", "xxx"}. In other words, the
PEG-based parser fails to behave in the way the similar-looking CFG does for
input strings of 5 or more x's, because it deterministically makes the
"wrong" parsing choices in the middle of the string and never goes back to
explore different choices as a fully backtracking CFG-based parser might.


It's easy in this case to write a _different_ PEG that recognizes the same
language {x^(2^n - 1) | n > 0}, e.g., "P <- x x P | x", but that fact doesn't
seem to generalize to CFGs in general.


> Looking at this, I also wonder whether there exist a PEG for the CFL
> {ww^R | w \in {a,b}^*}.


My guess is no. The basic problem seems to be the lack of syntactic markers
in the middle of the string from which the PEG-based parser might start
building correct intermediate results.


> All this to get to the conclusion that the immediate translation of a
> CFG into a PEG by replacing "|" with "/" will not necesseraly yield the
> expected result. Even the simple grammar "S -> Ac, A -> a| ab" should
> not be translated "S <- Ac, A <- a / ab" but "S <- Ac, A <- ab / a",
> otherwise input "abc" would not be accepted.


Correct.


Cheers,
Bryan


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.