Related articles |
---|
[19 earlier articles] |
Re: Parsing Expression Grammar Meyer-Eltz@t-online.de (Detlef Meyer-Eltz) (2005-09-18) |
Re: Parsing Expression Grammar cleos@nb.sympatico.ca (Cleo Saulnier) (2005-09-22) |
Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22) |
Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22) |
Re: Parsing Expression Grammar rsc@swtch.com (Russ Cox) (2005-09-22) |
Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-09-22) |
Re: Parsing Expression Grammar schmitz@i3s.unice.fr (Sylvain Schmitz) (2005-09-22) |
Re: Parsing Expression Grammar paul@parsetec.com (Paul Mann) (2005-09-22) |
Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-09-23) |
Re: Parsing Expression Grammar gneuner2@comcast.net (George Neuner) (2005-09-25) |
Re: Parsing Expression Grammar cfc@world.std.com (Chris F Clark) (2005-09-25) |
Re: Parsing Expression Grammar cfc@TheWorld.com (Chris F Clark) (2005-09-27) |
Re: Parsing Expression Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2005-09-27) |
[2 later articles] |
From: | Sylvain Schmitz <schmitz@i3s.unice.fr> |
Newsgroups: | comp.compilers |
Date: | 22 Sep 2005 23:52:20 -0400 |
Organization: | Compilers Central |
References: | 05-08-115 <432FE722.1070801@i3s.unice.fr> <ee9e417a05092006516e0d9d95@mail.gmail.com> <200509201743.39415.baford@mit.edu> |
Keywords: | parse |
Posted-Date: | 22 Sep 2005 23:52:20 EDT |
Bryan Ford wrote:
> The fact that it's possible to transform most CFGs into a syntactically
> correct PEG by replacing "->" with "<-" and "|" with "/" does not by any
> means imply that the PEG so generated will actually represent the same
> language as the CFG did. CFGs and PEGs are different syntactic
> meta-languages whose expressions have substantially different
> interpretations, and I've never yet even tried to define a "direct" formal
> relationship directly between similar-looking PEGs and CFGs. In other words,
> the fact that the CFG "P -> x P x | x" looks a lot like the PEG "P <- x P x /
> x" means absolutely nothing formally.
>
> It's probably possible to define some such formal correspondence between a
> very restrictive subset of CFGs, and a similar very restrictive subset of
> PEGs, and prove that the corresponding grammars represent the same language.
> The LL(0) CFGs might be a good starting point for building such a
> correspondence. I haven't done so yet, however, nor has anyone else that I
> know of. Needless to say, the CFG "P -> x P x | x" would probably not be
> covered by this correspondence.
It seems like a good idea; I would have liked a generative
interpretation of all PEGs, but the undecidability of the emptiness
problem is a sign that it's unlikely to happen. The bottom line here is
that it can be difficult to tell what language a PEG will accept.
Am I wrong if I write that the language L(alpha / beta) recognized by
the PEG "alpha / beta" is
L(alpha / beta) = L(alpha) \cup { xy \in L(beta) | x \not\in L(alpha) }
It looks like a good starting point for understanding PEGs as generative
devices. (At least I find it helpful.)
>> Now, strictly speaking, the language recognized by "P <- x P x / x"
>>*is* {x^(2^n - 1) | n > 0} (and that's quite an achievement in itself),
>
> Actually, if you analyze the behavior of the above PEG further I think you'll
> find that it represents the language {"x", "xxx"}. In other words, the
> PEG-based parser fails to behave in the way the similar-looking CFG does for
> input strings of 5 or more x's, because it deterministically makes the
> "wrong" parsing choices in the middle of the string and never goes back to
> explore different choices as a fully backtracking CFG-based parser might.
>
> It's easy in this case to write a _different_ PEG that recognizes the same
> language {x^(2^n - 1) | n > 0}, e.g., "P <- x x P | x", but that fact doesn't
> seem to generalize to CFGs in general.
It seems like you read the language expression a bit fast; I concur
P -> x P x | x,
P -> x x P | x and
P <- x x P / x
all recognize the same language { x^(2n - 1) | n > 0 }, but
P <- x P x / x
recognizes { x^(2^n - 1) | n > 0 }, a non CF language, not {x, xxx}. I
joined the source of a packrat parser for it (and I apologize for my
poor Haskell skills).
>>Looking at this, I also wonder whether there exists a PEG for the CFL
>>{ww^R | w \in {a,b}^*}.
>
> My guess is no. The basic problem seems to be the lack of syntactic markers
> in the middle of the string from which the PEG-based parser might start
> building correct intermediate results.
Isn't this overall problem handled in Birman's Information and Control
paper? I haven't really read it yet, but he introduces multiple-failure
schemes, (l,m)-TS, which look like a way of expressing the need for more
backtracking power in a "bounded" way; the problem with palindromes
being that they need an "unbounded" number of backtracks, hence their
quadratic time complexity in Earley parsing. (This is just an intuition.)
--
Regards,
Sylvain
-- PalPackrat.hs
--
-- Packrat parser for grammar P <- x P x / x.
-- See "4.3 Recognition Limitations" in Bryan Ford, Packrat Parsing: Simple,
-- Powerful, Lazy, Linear Time. ICFP'02.
-- The motivation for the study of this grammar is its possibly inherent
-- quadratic time parsing complexity; see Jay Earley, An Efficient Context-Free
-- Parsing Algorithm. Communications of the ACM, 13(2):94-102, 1970.
--
-- Usage in Hugs98: let's see why "xxxxx" is not accepted:
-- Hugs.Base> :load PalPackrat.hs
-- PalPackrat> dvPalindrom (parse "345")
-- "(P <- '3'(P <- '4')'5')"
-- Remains ""
-- PalPackrat> dvPalindrom (parse "2345")
-- "(P <- '2')"
-- Remains "345"
-- PalPackrat> dvPalindrom (parse "12345")
-- "(P <- '1'(P <- '2')'3')"
-- Remains "45"
-- PalPackrat> map eval [1,3..66]
-- [1,3,0,7,0,0,0,15,0,0,0,0,0,0,0,31,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,63,0]
module PalPackrat where
data Result v = Parsed v Derivs | NoParse
instance Show v => Show (Result v) where
show (Parsed v (Derivs r p c)) = show v ++ "\nRemains \"" ++ r ++ "\""
show (NoParse ) = "No Parse"
data Derivs = Derivs {
remain :: String,
dvPalindrom :: Result String,
dvChar :: Result Char
}
-- Returns `n' if the parser accepts an input of length `n', `0' otherwise.
eval :: Int -> Int
eval n = case dvPalindrom (parse (map (\i -> 'x') [1..n])) of
Parsed v d -> case dvChar d of -- check for EOF
NoParse -> n
_ -> 0
_ -> 0
-- Construct a parse result structure for an input string.
parse :: String -> Derivs
parse s = d where
d = Derivs r pal char
r = s
pal = pPalindrom d
char = case s of
(c:s') -> Parsed c (parse s')
[] -> NoParse
-- P
pPalindrom :: Derivs -> Result String
pPalindrom d = alt1 where
-- P <- x P x
alt1 = case dvChar d of
Parsed c1 d' -> case dvPalindrom d' of
Parsed p d'' -> case dvChar d'' of
Parsed c2 d''' -> Parsed ("(P <- " ++ (show c1) ++ p ++ (show c2) ++ ")") d'''
_ -> alt2
_ -> alt2
_ -> alt2
-- P <- x
alt2 = case dvChar d of
Parsed c d' -> Parsed ("(P <- " ++ (show c) ++ ")") d'
_ -> NoParse
-- An equivalent result can be obtained using pappy
-- [http://www.pdos.lcs.mit.edu/~baford/packrat/thesis/]
-- to generate a packrat parser from the following 'Pal.pappy' file:
-- parser Pal:
--
-- top Axiom
--
-- Axiom :: Int =
-- p:Palindrom !Char -> { p }
--
-- Palindrom :: Int =
-- 'x' p:Palindrom 'x' -> { p + 2 }
-- / 'x' -> { 1 }
--
-- {
-- -- test:
-- -- Pal> map eval [1,3..66]
-- -- [1,3,0,7,0,0,0,15,0,0,0,0,0,0,0,31,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,63,0]
-- --
-- eval :: Int -> Int
-- eval n = case palAxiom (palParse "Palindrom" (map (\i -> 'x') [1..n])) of
-- Parsed v _ _ -> v
-- NoParse e -> 0
-- }
Return to the
comp.compilers page.
Search the
comp.compilers archives again.