Related articles |
---|
LL Parser problem jdlessl@yahoo.com (2004-08-09) |
Re: LL Parser problem jdlessl@yahoo.com (2004-08-10) |
Re: LL Parser problem wyrmwif@tsoft.org (SM Ryan) (2004-08-11) |
Re: LL Parser problem nick.roberts@acm.org (Nick Roberts) (2004-08-13) |
Re: LL Parser problem cfc@shell01.TheWorld.com (Chris F Clark) (2004-08-15) |
Re: LL Parser problem nick.roberts@acm.org (Nick Roberts) (2004-08-23) |
Re: LL Parser problem rich@pennware.com (Richard Pennington) (2004-08-25) |
Re: LL Parser problem vbdis@aol.com (2004-09-03) |
Re: LL Parser problem cfc@shell01.TheWorld.com (Chris F Clark) (2004-09-07) |
From: | Richard Pennington <rich@pennware.com> |
Newsgroups: | comp.compilers |
Date: | 25 Aug 2004 14:52:12 -0400 |
Organization: | SBC http://yahoo.sbc.com |
References: | 04-08-060 04-08-063 04-08-082 04-08-101 04-08-121 |
Keywords: | parse, LL(1) |
Posted-Date: | 25 Aug 2004 14:52:12 EDT |
Nick Roberts wrote:
[snip]
> My phraseology was poor (sorry).
>
> I didn't mean to suggest that parser generators are losing their
> relevance, but rather that /limited grammar/ parser generators are
> losing their relevance (compared to backtracking parser generators,
> whose grammars are not limited, as such).
>
> I agree with the idea of using a language which supports backtracking
> (or can be programmed to support it neatly). In a way, that's what a
> parser generator is.
>
> It's not actually that difficult to backtrack in an imperative
> language. You only need to place 'mark', 'rewind', and 'unmark'
> operations in the appropriate places. But it is much neater and
> easier if these operations are done intrinsically.
My current (personal) project looks like this:
Given a grammar fragment:
line:
expr ';' -> $$ = $1
| expr '\n' -> $$ = $1
| if expr then line -> $$ = (if $2 $4)
| if expr then line else line -> $$ = (if $2 $4 $6)
| '\n' // Ignore blank lines.
;
expr: add -> $$ = $1
| expr '=' expr -> $$ = (assign $1 $3)
;
add: mul -> $$ = $1
| add '+' add -> $$ = (add $1 $3)
| add '-' add -> $$ = (subtract $1 $3)
;
mul: primary -> $$ = $1
| mul '*' mul -> $$ = (multiply $1 $3)
| mul '/' mul -> $$ = (divide $1 $3)
;
primary: IDENTIFIER
| INTEGER
| '-' primary -> $$ = (negate $2)
| '(' expr ')' -> $$ = $2
| primary '(' list ')' -> $$ = (call $1 $3)
;
list: expr -> $$ = $1
| expr ',' list -> $$ = ($1 $3)
;
And the source:
a = 1;
b = a + 2
c = b - 5 * a;
c = (b - 5) * a;
d = -b * 4;
e = f(10, a, b, c);
e = (g + 1)(10, a, b, c);
if a + 2 then if c then g(1); else b = 5;
Produces:
(assign a 1)
(assign b (add a 2))
(assign c (subtract b (multiply 5 a)))
(assign c (multiply (subtract b 5) a))
(assign d (multiply (negate b) 4))
(assign e (call f 10 a b c))
(assign e (call (add g 1) 10 a b c))
(ambiguous (if (add a 2) (if c (call g 1)) (assign b 5)) (if (add a 2)
(if c (call g 1) (assign b 5))))
i.e. arbitrary syntax trees with ambiguities marked. Notice both
possibilities of "if" have been parsed. (It's easier to see if you
can match the parens. Sorry for the LISP-like look.)
I use a "parse every possiblility until you hit a dead end" approach.
(I assume this has a better, real name.) There is no backtracking,
at least explicitly.
Another example:
program:
( line )+ // One or more lines.
;
line:
declaration ';' -> (do! $1)
| expr ';' -> (do! $1)
| string_expr ';' -> (do! $1)
| print expr_list ';' -> (print! $2)
;
expr_list:
expr -> $$ = $1
| string_expr -> $$ = $1
| expr_list ',' expr_list -> $$ = ($1 $3)
;
declaration:
number id_list -> (define number $2)
| string id_list -> (define string $2)
;
id_list:
IDENTIFIER -> $$ = $1
| id_list ',' IDENTIFIER -> $$ = ($1 $3)
;
expr:
add -> $$ = $1
| IDENTIFIER '=' expr -> $$ = (assign $1 $3)
;
add:
mul -> $$ = $1
| add '+' add -> $$ = (add $1 $3)
| add '-' add -> $$ = (subtract $1 $3)
;
mul:
primary -> $$ = $1
| mul '*' mul -> $$ = (multiply $1 $3)
| mul '/' mul -> $$ = (divide $1 $3)
;
primary:
IDENTIFIER -> $$ = (isNumber $1)
| NUMBER -> $$ = $1
| '-' primary -> $$ = (negate $2)
| '+' primary -> $$ = $2
| '(' expr ')' -> $$ = $2
;
string_expr:
string_add -> $$ = $1
| IDENTIFIER '=' string_expr -> $$ = (assign $1 $3)
;
string_add:
string_primary -> $$ = $1
| string_add '+' string_add -> $$ = (add $1 $3)
;
string_primary:
IDENTIFIER -> $$ = (isString $1)
| STRING -> $$ = $1
| '(' string_expr ')' -> $$ = $2
;
(Obvious ambiguities, especially in the "print" command.)
With the source:
number a, b;
number c, d, e;
string s1, s2;
a = 1;
s1 = "The first string";
print a;
print a, b;
b = a + 2;
produces:
(do (define number a b))
(do (define number c d e))
(do (define string s1 s2))
(do (assign a 1))
(do (assign s1 "The first string"))
(print (ambiguous (isNumber a) (isString a)))
(print (ambiguous ((isNumber a) (isNumber b)) ((isNumber a) (isString
b)) ((isString a) (isNumber b)) ((isString a) (isString b))))
(do (assign b (add (isNumber a) 2)))
I like the fact that I can use just about any grammar.
Now I'm working on weeding out the ambiguities during later
semantic analysis.
-Rich
--
Richard Pennington
Email: rich@pennware.com
http://www.pennware.com ftp://ftp.pennware.com
Return to the
comp.compilers page.
Search the
comp.compilers archives again.