Re: shift/reduce problem

Kaz Kylheku <kaz@kylheku.com>
Thu, 20 Oct 2011 05:47:26 +0000 (UTC)

          From comp.compilers

Related articles
shift/reduce problem csnews77@gmail.com (JohnSmith) (2010-12-02)
Re: shift/reduce problem haberg-news@telia.com (Hans Aberg) (2010-12-03)
Re: shift/reduce problem csnews77@gmail.com (JohnSmith) (2010-12-03)
Re: shift/reduce problem haberg-news@telia.com (Hans Aberg) (2010-12-03)
Re: shift/reduce problem gene.ressler@gmail.com (Gene) (2010-12-03)
Re: shift/reduce problem chakaram@auth.gr (Chariton Karamitas) (2010-12-05)
Re: shift/reduce problem seimarao@gmail.com (Seima) (2010-12-30)
Re: shift/reduce problem kaz@kylheku.com (Kaz Kylheku) (2011-10-20)
| List of all articles for this month |

From: Kaz Kylheku <kaz@kylheku.com>
Newsgroups: comp.compilers
Date: Thu, 20 Oct 2011 05:47:26 +0000 (UTC)
Organization: A noiseless patient Spider
References: 10-12-004
Keywords: parse
Posted-Date: 20 Oct 2011 06:15:44 EDT

On 2010-12-02, JohnSmith <csnews77@gmail.com> wrote:
> I have to parse the following string like this
>
> "(input i, j, k, input u, v, w)"


(In case anyone still cares about this nearly one year old thread. )


The crux of the problem here is caused by the comma.


The language in this parenthesized construct is actually a simple
regular language, as witnessed when we strip away the annoying commas,
and Lispify it a little bit:




    (:input i j k :input u v w)


Yacc chokes on the original because the comma is overloaded with two
meanings: variable separator and segment separator. Furthermore, the
comma occupies a precious lookahead position. You don't know which
comma you are dealing with, and you can't look beyond it at the other
tokens which would tell you.


Instead of the next token being either INPUT or a variable symbol,
there is a comma there. The state machine doesn't know what to do. Is
it the comma which should reduce "input i j k" to an input
nonterminal? Or is it a variable separating comma after which we can
expect another variable?


Here is a Yacc file for a similar grammar with a shift/reduce conflict:


    %{


    %}


    %token '(' ',' ')'
    %token INPUT VAR


    %%


    file: '(' list ')'


    list: input | list ',' input


    input: INPUT varlist


    varlist: VAR | varlist ',' VAR


The following, however, has no conflicts. To produce it, all I
did was simply remove all occurences of ',':


    %{


    %}


    %token '(' ')'
    %token INPUT VAR


    %%


    file: '(' list ')'


    list: input | list input


    input: INPUT varlist


    varlist: VAR | varlist VAR




So a possible solution here is to simply have the lexical analyzer eat
these darn commas.


Side note: a bigger issue is the social one: stop designing languages
with superfluous punctuation in them that exists just for the sake of
punctuation. Furthermore, stop caving in to colleagues who invent
these things.


The lexical analyzer can have a state flag which tells it whether to
silently consume commas as if they were comments, or return them as
tokens (since there may be areas of the grammar where we need to rely
on the commas). This can be hooked into the parser:




  file: '(' { lex_eat_commas(1); }
                      list ')' { lex_eat_commas(0); }


Our LALR(1) parser gets to "look ahead" past the comma simply because
the lexer ate it.


Simple. No backtracking, no GLR to blow up the state space.


Cut out the tumor, heal the patient, and prescribe a diet free of syntactic
sugar.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.