22 Sep 2005 23:49:16 -0400

Related articles |
---|

[17 earlier articles] |

Re: Parsing Expression Grammar cleos@nb.sympatico.ca (Cleo Saulnier) (2005-09-17) |

Re: Parsing Expression Grammar DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-09-17) |

Re: Parsing Expression Grammar Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-09-18) |

Re: Parsing Expression Grammar cleos@nb.sympatico.ca (Cleo Saulnier) (2005-09-22) |

Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22) |

Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22) |

Re: Parsing Expression Grammar rsc@swtch.com (Russ Cox) (2005-09-22) |

Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-09-22) |

Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22) |

Re: Parsing Expression Grammar paul@parsetec.com (Paul Mann) (2005-09-22) |

Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-09-23) |

Re: Parsing Expression Grammar gneuner2@comcast.net (George Neuner) (2005-09-25) |

Re: Parsing Expression Grammar cfc@world.std.com (Chris F Clark) (2005-09-25) |

[4 later articles] |

From: | Russ Cox <rsc@swtch.com> |

Newsgroups: | comp.compilers |

Date: | 22 Sep 2005 23:49:16 -0400 |

Organization: | Compilers Central |

References: | <432FE722.1070801@i3s.unice.fr> |

Keywords: | parse |

Posted-Date: | 22 Sep 2005 23:49:15 EDT |

Forwarded with permission.

From: Bryan Ford <baford@mit.edu>

Date: Sep 20, 2005 5:43 PM

Subject: Re: Fwd: Parsing Expression Grammar

*> All this thread on PEGs has convinced me of giving them a better look.*

*> But something isn't quite clear; in his ICFP'03 paper, Bryan Ford*

*> presented an unambiguous CFG that could not be recognized by his packrat*

*> parser:*

*>*

*> P -> x P x | x*

*>*

*> (this CFG is especially interesting for it has a quadratic parsing time*

*> complexity using an Earley parser).*

*> I have tried "P <- x P x / x" to see what would happen, and indeed,*

*> the packrat parser is only able to parse sequences of 2^n - 1 "x"*

*> symbols, n > 0.*

*>*

*> This problem does not seem to be addressed in his POPL'04 paper; the*

*> well-formed definition, if I understand it correctly, is only concerned*

*> about left recursion and does not reject the above grammar.*

The fact that it's possible to transform most CFGs into a syntactically

correct PEG by replacing "->" with "<-" and "|" with "/" does not by any

means imply that the PEG so generated will actually represent the same

language as the CFG did. CFGs and PEGs are different syntactic

meta-languages whose expressions have substantially different

interpretations, and I've never yet even tried to define a "direct" formal

relationship directly between similar-looking PEGs and CFGs. In other words,

the fact that the CFG "P -> x P x | x" looks a lot like the PEG "P <- x P x /

x" means absolutely nothing formally.

It's probably possible to define some such formal correspondence between a

very restrictive subset of CFGs, and a similar very restrictive subset of

PEGs, and prove that the corresponding grammars represent the same language.

The LL(0) CFGs might be a good starting point for building such a

correspondence. I haven't done so yet, however, nor has anyone else that I

know of. Needless to say, the CFG "P -> x P x | x" would probably not be

covered by this correspondence.

*> Now, strictly speaking, the language recognized by "P <- x P x / x"*

*> *is* {x^(2^n - 1) | n > 0} (and that's quite an achievement in itself),*

Actually, if you analyze the behavior of the above PEG further I think you'll

find that it represents the language {"x", "xxx"}. In other words, the

PEG-based parser fails to behave in the way the similar-looking CFG does for

input strings of 5 or more x's, because it deterministically makes the

"wrong" parsing choices in the middle of the string and never goes back to

explore different choices as a fully backtracking CFG-based parser might.

It's easy in this case to write a _different_ PEG that recognizes the same

language {x^(2^n - 1) | n > 0}, e.g., "P <- x x P | x", but that fact doesn't

seem to generalize to CFGs in general.

*> Looking at this, I also wonder whether there exist a PEG for the CFL*

*> {ww^R | w \in {a,b}^*}.*

My guess is no. The basic problem seems to be the lack of syntactic markers

in the middle of the string from which the PEG-based parser might start

building correct intermediate results.

*> All this to get to the conclusion that the immediate translation of a*

*> CFG into a PEG by replacing "|" with "/" will not necesseraly yield the*

*> expected result. Even the simple grammar "S -> Ac, A -> a| ab" should*

*> not be translated "S <- Ac, A <- a / ab" but "S <- Ac, A <- ab / a",*

*> otherwise input "abc" would not be accepted.*

Correct.

Cheers,

Bryan

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.