22 Sep 2005 23:52:20 -0400

Related articles |
---|

[19 earlier articles] |

Re: Parsing Expression Grammar Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-09-18) |

Re: Parsing Expression Grammar cleos@nb.sympatico.ca (Cleo Saulnier) (2005-09-22) |

Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22) |

Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22) |

Re: Parsing Expression Grammar rsc@swtch.com (Russ Cox) (2005-09-22) |

Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-09-22) |

Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22) |

Re: Parsing Expression Grammar paul@parsetec.com (Paul Mann) (2005-09-22) |

Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-09-23) |

Re: Parsing Expression Grammar gneuner2@comcast.net (George Neuner) (2005-09-25) |

Re: Parsing Expression Grammar cfc@world.std.com (Chris F Clark) (2005-09-25) |

Re: Parsing Expression Grammar cfc@TheWorld.com (Chris F Clark) (2005-09-27) |

Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-09-27) |

[2 later articles] |

From: | Sylvain Schmitz <schmitz@i3s.unice.fr> |

Newsgroups: | comp.compilers |

Date: | 22 Sep 2005 23:52:20 -0400 |

Organization: | Compilers Central |

References: | 05-08-115 <432FE722.1070801@i3s.unice.fr> <ee9e417a05092006516e0d9d95@mail.gmail.com> <200509201743.39415.baford@mit.edu> |

Keywords: | parse |

Posted-Date: | 22 Sep 2005 23:52:20 EDT |

Bryan Ford wrote:

*> The fact that it's possible to transform most CFGs into a syntactically*

*> correct PEG by replacing "->" with "<-" and "|" with "/" does not by any*

*> means imply that the PEG so generated will actually represent the same*

*> language as the CFG did. CFGs and PEGs are different syntactic*

*> meta-languages whose expressions have substantially different*

*> interpretations, and I've never yet even tried to define a "direct" formal*

*> relationship directly between similar-looking PEGs and CFGs. In other words,*

*> the fact that the CFG "P -> x P x | x" looks a lot like the PEG "P <- x P x /*

*> x" means absolutely nothing formally.*

*>*

*> It's probably possible to define some such formal correspondence between a*

*> very restrictive subset of CFGs, and a similar very restrictive subset of*

*> PEGs, and prove that the corresponding grammars represent the same language.*

*> The LL(0) CFGs might be a good starting point for building such a*

*> correspondence. I haven't done so yet, however, nor has anyone else that I*

*> know of. Needless to say, the CFG "P -> x P x | x" would probably not be*

*> covered by this correspondence.*

It seems like a good idea; I would have liked a generative

interpretation of all PEGs, but the undecidability of the emptiness

problem is a sign that it's unlikely to happen. The bottom line here is

that it can be difficult to tell what language a PEG will accept.

Am I wrong if I write that the language L(alpha / beta) recognized by

the PEG "alpha / beta" is

L(alpha / beta) = L(alpha) \cup { xy \in L(beta) | x \not\in L(alpha) }

It looks like a good starting point for understanding PEGs as generative

devices. (At least I find it helpful.)

*>> Now, strictly speaking, the language recognized by "P <- x P x / x"*

*>>*is* {x^(2^n - 1) | n > 0} (and that's quite an achievement in itself),*

*>*

*> Actually, if you analyze the behavior of the above PEG further I think you'll*

*> find that it represents the language {"x", "xxx"}. In other words, the*

*> PEG-based parser fails to behave in the way the similar-looking CFG does for*

*> input strings of 5 or more x's, because it deterministically makes the*

*> "wrong" parsing choices in the middle of the string and never goes back to*

*> explore different choices as a fully backtracking CFG-based parser might.*

*>*

*> It's easy in this case to write a _different_ PEG that recognizes the same*

*> language {x^(2^n - 1) | n > 0}, e.g., "P <- x x P | x", but that fact doesn't*

*> seem to generalize to CFGs in general.*

It seems like you read the language expression a bit fast; I concur

P -> x P x | x,

P -> x x P | x and

P <- x x P / x

all recognize the same language { x^(2n - 1) | n > 0 }, but

P <- x P x / x

recognizes { x^(2^n - 1) | n > 0 }, a non CF language, not {x, xxx}. I

joined the source of a packrat parser for it (and I apologize for my

poor Haskell skills).

*>>Looking at this, I also wonder whether there exists a PEG for the CFL*

*>>{ww^R | w \in {a,b}^*}.*

*>*

*> My guess is no. The basic problem seems to be the lack of syntactic markers*

*> in the middle of the string from which the PEG-based parser might start*

*> building correct intermediate results.*

Isn't this overall problem handled in Birman's Information and Control

paper? I haven't really read it yet, but he introduces multiple-failure

schemes, (l,m)-TS, which look like a way of expressing the need for more

backtracking power in a "bounded" way; the problem with palindromes

being that they need an "unbounded" number of backtracks, hence their

quadratic time complexity in Earley parsing. (This is just an intuition.)

--

Regards,

Sylvain

-- PalPackrat.hs

--

-- Packrat parser for grammar P <- x P x / x.

-- See "4.3 Recognition Limitations" in Bryan Ford, Packrat Parsing: Simple,

-- Powerful, Lazy, Linear Time. ICFP'02.

-- The motivation for the study of this grammar is its possibly inherent

-- quadratic time parsing complexity; see Jay Earley, An Efficient Context-Free

-- Parsing Algorithm. Communications of the ACM, 13(2):94-102, 1970.

--

-- Usage in Hugs98: let's see why "xxxxx" is not accepted:

-- Hugs.Base> :load PalPackrat.hs

-- PalPackrat> dvPalindrom (parse "345")

-- "(P <- '3'(P <- '4')'5')"

-- Remains ""

-- PalPackrat> dvPalindrom (parse "2345")

-- "(P <- '2')"

-- Remains "345"

-- PalPackrat> dvPalindrom (parse "12345")

-- "(P <- '1'(P <- '2')'3')"

-- Remains "45"

-- PalPackrat> map eval [1,3..66]

-- [1,3,0,7,0,0,0,15,0,0,0,0,0,0,0,31,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,63,0]

module PalPackrat where

data Result v = Parsed v Derivs | NoParse

instance Show v => Show (Result v) where

show (Parsed v (Derivs r p c)) = show v ++ "\nRemains \"" ++ r ++ "\""

show (NoParse ) = "No Parse"

data Derivs = Derivs {

remain :: String,

dvPalindrom :: Result String,

dvChar :: Result Char

}

-- Returns `n' if the parser accepts an input of length `n', `0' otherwise.

eval :: Int -> Int

eval n = case dvPalindrom (parse (map (\i -> 'x') [1..n])) of

Parsed v d -> case dvChar d of -- check for EOF

NoParse -> n

_ -> 0

_ -> 0

-- Construct a parse result structure for an input string.

parse :: String -> Derivs

parse s = d where

d = Derivs r pal char

r = s

pal = pPalindrom d

char = case s of

(c:s') -> Parsed c (parse s')

[] -> NoParse

-- P

pPalindrom :: Derivs -> Result String

pPalindrom d = alt1 where

-- P <- x P x

alt1 = case dvChar d of

Parsed c1 d' -> case dvPalindrom d' of

Parsed p d'' -> case dvChar d'' of

Parsed c2 d''' -> Parsed ("(P <- " ++ (show c1) ++ p ++ (show c2) ++ ")") d'''

_ -> alt2

_ -> alt2

_ -> alt2

-- P <- x

alt2 = case dvChar d of

Parsed c d' -> Parsed ("(P <- " ++ (show c) ++ ")") d'

_ -> NoParse

-- An equivalent result can be obtained using pappy

-- [http://www.pdos.lcs.mit.edu/~baford/packrat/thesis/]

-- to generate a packrat parser from the following 'Pal.pappy' file:

-- parser Pal:

--

-- top Axiom

--

-- Axiom :: Int =

-- p:Palindrom !Char -> { p }

--

-- Palindrom :: Int =

-- 'x' p:Palindrom 'x' -> { p + 2 }

-- / 'x' -> { 1 }

--

-- {

-- -- test:

-- -- Pal> map eval [1,3..66]

-- -- [1,3,0,7,0,0,0,15,0,0,0,0,0,0,0,31,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,63,0]

-- --

-- eval :: Int -> Int

-- eval n = case palAxiom (palParse "Palindrom" (map (\i -> 'x') [1..n])) of

-- Parsed v _ _ -> v

-- NoParse e -> 0

-- }

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.