Re: LL Parser problem

Richard Pennington <rich@pennware.com>
25 Aug 2004 14:52:12 -0400

          From comp.compilers

Related articles
LL Parser problem jdlessl@yahoo.com (2004-08-09)
Re: LL Parser problem jdlessl@yahoo.com (2004-08-10)
Re: LL Parser problem wyrmwif@tsoft.org (SM Ryan) (2004-08-11)
Re: LL Parser problem nick.roberts@acm.org (Nick Roberts) (2004-08-13)
Re: LL Parser problem cfc@shell01.TheWorld.com (Chris F Clark) (2004-08-15)
Re: LL Parser problem nick.roberts@acm.org (Nick Roberts) (2004-08-23)
Re: LL Parser problem rich@pennware.com (Richard Pennington) (2004-08-25)
Re: LL Parser problem vbdis@aol.com (2004-09-03)
Re: LL Parser problem cfc@shell01.TheWorld.com (Chris F Clark) (2004-09-07)
| List of all articles for this month |

From: Richard Pennington <rich@pennware.com>
Newsgroups: comp.compilers
Date: 25 Aug 2004 14:52:12 -0400
Organization: SBC http://yahoo.sbc.com
References: 04-08-060 04-08-063 04-08-082 04-08-101 04-08-121
Keywords: parse, LL(1)
Posted-Date: 25 Aug 2004 14:52:12 EDT

Nick Roberts wrote:
[snip]
> My phraseology was poor (sorry).
>
> I didn't mean to suggest that parser generators are losing their
> relevance, but rather that /limited grammar/ parser generators are
> losing their relevance (compared to backtracking parser generators,
> whose grammars are not limited, as such).
>
> I agree with the idea of using a language which supports backtracking
> (or can be programmed to support it neatly). In a way, that's what a
> parser generator is.
>
> It's not actually that difficult to backtrack in an imperative
> language. You only need to place 'mark', 'rewind', and 'unmark'
> operations in the appropriate places. But it is much neater and
> easier if these operations are done intrinsically.


My current (personal) project looks like this:


Given a grammar fragment:


line:
            expr ';' -> $$ = $1
        | expr '\n' -> $$ = $1
        | if expr then line -> $$ = (if $2 $4)
        | if expr then line else line -> $$ = (if $2 $4 $6)
        | '\n' // Ignore blank lines.
        ;


expr: add -> $$ = $1
        | expr '=' expr -> $$ = (assign $1 $3)
        ;


add: mul -> $$ = $1
        | add '+' add -> $$ = (add $1 $3)
        | add '-' add -> $$ = (subtract $1 $3)
        ;


mul: primary -> $$ = $1
        | mul '*' mul -> $$ = (multiply $1 $3)
        | mul '/' mul -> $$ = (divide $1 $3)
        ;


primary: IDENTIFIER
        | INTEGER
        | '-' primary -> $$ = (negate $2)
        | '(' expr ')' -> $$ = $2
        | primary '(' list ')' -> $$ = (call $1 $3)
        ;


list: expr -> $$ = $1
        | expr ',' list -> $$ = ($1 $3)
        ;


And the source:


a = 1;
b = a + 2
c = b - 5 * a;
c = (b - 5) * a;
d = -b * 4;
e = f(10, a, b, c);
e = (g + 1)(10, a, b, c);


if a + 2 then if c then g(1); else b = 5;


Produces:


(assign a 1)
    (assign b (add a 2))
    (assign c (subtract b (multiply 5 a)))
    (assign c (multiply (subtract b 5) a))
    (assign d (multiply (negate b) 4))
    (assign e (call f 10 a b c))
    (assign e (call (add g 1) 10 a b c))


    (ambiguous (if (add a 2) (if c (call g 1)) (assign b 5)) (if (add a 2)
(if c (call g 1) (assign b 5))))


i.e. arbitrary syntax trees with ambiguities marked. Notice both
possibilities of "if" have been parsed. (It's easier to see if you
can match the parens. Sorry for the LISP-like look.)


I use a "parse every possiblility until you hit a dead end" approach.
(I assume this has a better, real name.) There is no backtracking,
at least explicitly.


Another example:


program:
            ( line )+ // One or more lines.
        ;


line:
            declaration ';' -> (do! $1)
        | expr ';' -> (do! $1)
        | string_expr ';' -> (do! $1)
        | print expr_list ';' -> (print! $2)
        ;


expr_list:
            expr -> $$ = $1
        | string_expr -> $$ = $1
        | expr_list ',' expr_list -> $$ = ($1 $3)
        ;


declaration:
            number id_list -> (define number $2)
        | string id_list -> (define string $2)
        ;


id_list:
            IDENTIFIER -> $$ = $1
        | id_list ',' IDENTIFIER -> $$ = ($1 $3)
        ;


expr:
            add -> $$ = $1
        | IDENTIFIER '=' expr -> $$ = (assign $1 $3)
        ;


add:
            mul -> $$ = $1
        | add '+' add -> $$ = (add $1 $3)
        | add '-' add -> $$ = (subtract $1 $3)
        ;


mul:
            primary -> $$ = $1
        | mul '*' mul -> $$ = (multiply $1 $3)
        | mul '/' mul -> $$ = (divide $1 $3)
        ;


primary:
            IDENTIFIER -> $$ = (isNumber $1)
        | NUMBER -> $$ = $1
        | '-' primary -> $$ = (negate $2)
        | '+' primary -> $$ = $2
        | '(' expr ')' -> $$ = $2
        ;


string_expr:
            string_add -> $$ = $1
        | IDENTIFIER '=' string_expr -> $$ = (assign $1 $3)
        ;


string_add:
            string_primary -> $$ = $1
        | string_add '+' string_add -> $$ = (add $1 $3)
        ;


string_primary:
            IDENTIFIER -> $$ = (isString $1)
        | STRING -> $$ = $1
        | '(' string_expr ')' -> $$ = $2
        ;


(Obvious ambiguities, especially in the "print" command.)


With the source:


number a, b;
number c, d, e;
string s1, s2;


a = 1;
s1 = "The first string";
print a;
print a, b;
b = a + 2;


produces:


(do (define number a b))
(do (define number c d e))
(do (define string s1 s2))
(do (assign a 1))
(do (assign s1 "The first string"))
(print (ambiguous (isNumber a) (isString a)))
(print (ambiguous ((isNumber a) (isNumber b)) ((isNumber a) (isString
b)) ((isString a) (isNumber b)) ((isString a) (isString b))))
(do (assign b (add (isNumber a) 2)))


I like the fact that I can use just about any grammar.
Now I'm working on weeding out the ambiguities during later
semantic analysis.


-Rich
--
Richard Pennington
Email: rich@pennware.com
http://www.pennware.com ftp://ftp.pennware.com


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.