Re: Parsing Expression Grammar

George Neuner <>
30 Sep 2005 02:02:24 -0400

          From comp.compilers

Related articles
[26 earlier articles]
Re: Parsing Expression Grammar (Paul Mann) (2005-09-22)
Re: Parsing Expression Grammar (Chris F Clark) (2005-09-23)
Re: Parsing Expression Grammar (George Neuner) (2005-09-25)
Re: Parsing Expression Grammar (Chris F Clark) (2005-09-25)
Re: Parsing Expression Grammar (Chris F Clark) (2005-09-27)
Re: Parsing Expression Grammar (Chris F Clark) (2005-09-27)
Re: Parsing Expression Grammar (George Neuner) (2005-09-30)
Re: Parsing Expression Grammar (Chris F Clark) (2005-10-02)
| List of all articles for this month |

From: George Neuner <>
Newsgroups: comp.compilers
Date: 30 Sep 2005 02:02:24 -0400
Organization: Compilers Central
References: 05-09-023 05-09-045 05-09-053 05-09-107 05-09-121 <> <> 05-09-129
Keywords: parse
Posted-Date: 30 Sep 2005 02:02:24 EDT

On 27 Sep 2005 09:44:34 -0400, Chris F Clark <> wrote:

>George Neuner <> writes:
>> Having [mid-rule actions] anywhere can introduce conflicts. Consider
>> rule1 : token1 {action1} token2
>> rule2 : token1 token3
>> The rules are not ambiguous, but because the action is not subject
>> to look ahead, the parser can't tell which way to go after recognizing
>> the common prefix. Logically action1 has to be executed before
>> token2 can be recognized ... not parsed, recognized. That means
>> the action must be taken *before* look ahead.
>Actually the rules are ambiguous, because you cannot tell whether to
>execute the action until you have seen the token after the actions.
>In particular, this grammar is not LL(1). The first token of the rule
>does not disambiguate whether to apply rule1 or rule2.

I agree that the grammar is not LL(1), but I question the conclusion
of ambiguity (see below). Mid-rule actions are technically meta
information and not part of the grammar. Depending on what they
actually do, there may be no noticable effect on the parsing.

>Unfortunately, many parser generators (including I must ruefully admit
>Yacc++) do not flag this ambiguity, they simply read one token ahead
>and implement the grammar wrong. Of course, there is no correct
>implementation of this grammar, as it specifies something which cannot
>be done. One cannot know whether to execute the action until the
>lookahead has been read to see if the token read is a token2.
>However, if the action somehow affects whether a token2 or token3 is
>returned from the lexer, one has a circularity.
>Effectively, by using the lookahead one is moving the action to a
>later point in the grammar, changing the rules to:
> rule1 : token1 token2 {action1}
> rule2 : token1 token3
>Note, because of this issue, I have considered introducing a new
>syntax into Yacc++ to indicate actions that should not be "moved"
>(i.e. have the tokens after them used as lookahead for
>disambiguation). I just haven't figured out what that syntax should
>be. That way if the rule was important for getting the token lexed
>right and it appear in a context where the lookahead was important for
>knowing whether to apply the action, one would get an error message
>indicating the circularity.

Your point is well taken. However, given the original grammar above,
I would argue that the likelihood that the action creates circularity
is very small. It seems to me that a reasonable grammar would
position such an action in a prefix common to both rules rather than
in either one of them. As the grammar is written it is far more
likely that the action affects only tokens following it.

Of course the tool must be conservative and issue a warning because it
cannot know the meaning of the action. However, I think it is quite
reasonable to try the rules without embedded actions first and
backtrack on failure, or alternatively, to fork and try the rules in

>Now if fixed lookahead cannot solve the problem, then one needs
>something more powerful that LL or LR parsing. Here is an example of
>that case:
> rule1 : token1 {action1} rule3 token3
> rule2 : token1 {action2} rule3 token5
> rule3 : token2 | token4 rule3
>In this case, one needs either backtracking or GLR parsing or the
>ability to move the action across non-terminals, as in:
> rule1 : token1 rule3 token3 {action1}
> rule2 : token1 rule3 token5 {action2}
> rule3 : token2 | token4 rule3
>Now, the above is the actual "correct" grammar for the case, with no
>actions that require lookahead.

It's only correct if one assumes the actions have no effect on lexing
- which is probably wrong. We've already established that moving
actions is logically wrong no matter what they do.

>The backtracking solution works like one has a predicated grammar. In
>fact, I think of predicates as indicating what parts of the grammar
>one wishes to backtrack over. Here is a predicated version of the
> rule1 : token1 (rule3 token3) >> {action1} rule3 token3
> rule2 : token1 (rule3 token5) >> {action2} rule3 token5
> rule3 : token2 | token4 rule3
>The notation used above for predicates, says the text in parenthesis
>preceding the >> operator is a predicate. That is the parser parses
>the text in the parenthesis as "lookahead" before continuing the
>parse. If the text is matched, the parse continues with the predicate
>text rewound (placed back in the input, backtracked over) before the
>rest of the rule is matched. If the text isn't matched, the parse
>does not apply the rule and rewinds the text before attempting some
>other rule.

This is where I was going in my previous post where I said rules with
embedded actions should be handled by predication. The problem lies
in your assumption of circularity being introduced by actions which is
an issue I had never considered (see above). If one must always
assume circularity, then I am not aware of any existing method that
will work. I suppose a recursing combinative fork method could be
used but I can't imagine such a thing being very efficient or easy to
automatically construct.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.