Related articles |
---|
shift/reduce problem csnews77@gmail.com (JohnSmith) (2010-12-02) |
Re: shift/reduce problem haberg-news@telia.com (Hans Aberg) (2010-12-03) |
Re: shift/reduce problem csnews77@gmail.com (JohnSmith) (2010-12-03) |
Re: shift/reduce problem haberg-news@telia.com (Hans Aberg) (2010-12-03) |
Re: shift/reduce problem gene.ressler@gmail.com (Gene) (2010-12-03) |
Re: shift/reduce problem chakaram@auth.gr (Chariton Karamitas) (2010-12-05) |
Re: shift/reduce problem seimarao@gmail.com (Seima) (2010-12-30) |
Re: shift/reduce problem kaz@kylheku.com (Kaz Kylheku) (2011-10-20) |
From: | Kaz Kylheku <kaz@kylheku.com> |
Newsgroups: | comp.compilers |
Date: | Thu, 20 Oct 2011 05:47:26 +0000 (UTC) |
Organization: | A noiseless patient Spider |
References: | 10-12-004 |
Keywords: | parse |
Posted-Date: | 20 Oct 2011 06:15:44 EDT |
On 2010-12-02, JohnSmith <csnews77@gmail.com> wrote:
> I have to parse the following string like this
>
> "(input i, j, k, input u, v, w)"
(In case anyone still cares about this nearly one year old thread. )
The crux of the problem here is caused by the comma.
The language in this parenthesized construct is actually a simple
regular language, as witnessed when we strip away the annoying commas,
and Lispify it a little bit:
(:input i j k :input u v w)
Yacc chokes on the original because the comma is overloaded with two
meanings: variable separator and segment separator. Furthermore, the
comma occupies a precious lookahead position. You don't know which
comma you are dealing with, and you can't look beyond it at the other
tokens which would tell you.
Instead of the next token being either INPUT or a variable symbol,
there is a comma there. The state machine doesn't know what to do. Is
it the comma which should reduce "input i j k" to an input
nonterminal? Or is it a variable separating comma after which we can
expect another variable?
Here is a Yacc file for a similar grammar with a shift/reduce conflict:
%{
%}
%token '(' ',' ')'
%token INPUT VAR
%%
file: '(' list ')'
list: input | list ',' input
input: INPUT varlist
varlist: VAR | varlist ',' VAR
The following, however, has no conflicts. To produce it, all I
did was simply remove all occurences of ',':
%{
%}
%token '(' ')'
%token INPUT VAR
%%
file: '(' list ')'
list: input | list input
input: INPUT varlist
varlist: VAR | varlist VAR
So a possible solution here is to simply have the lexical analyzer eat
these darn commas.
Side note: a bigger issue is the social one: stop designing languages
with superfluous punctuation in them that exists just for the sake of
punctuation. Furthermore, stop caving in to colleagues who invent
these things.
The lexical analyzer can have a state flag which tells it whether to
silently consume commas as if they were comments, or return them as
tokens (since there may be areas of the grammar where we need to rely
on the commas). This can be hooked into the parser:
file: '(' { lex_eat_commas(1); }
list ')' { lex_eat_commas(0); }
Our LALR(1) parser gets to "look ahead" past the comma simply because
the lexer ate it.
Simple. No backtracking, no GLR to blow up the state space.
Cut out the tumor, heal the patient, and prescribe a diet free of syntactic
sugar.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.