Related articles |
---|
Advice for recursive descent parser pk@pk.invalid (pk) (2009-03-31) |
Re: Advice for recursive descent parser jamin.hanson@googlemail.com (2009-04-01) |
Re: Advice for recursive descent parser cfc@shell01.TheWorld.com (Chris F Clark) (2009-04-01) |
From: | pk <pk@pk.invalid> |
Newsgroups: | comp.compilers |
Date: | Tue, 31 Mar 2009 17:32:20 +0200 |
Organization: | Not organized |
Keywords: | parse, question, comment |
Posted-Date: | 31 Mar 2009 14:32:51 EDT |
Hi,
Foreword: I'm a beginner in parsing, so there may be errors in what I'm
doing or writing - any correction welcome, thanks.
I'm trying to write a recursive descent parser for this grammar (simplified
regexes - sorry for the non formal grammar):
RE -> BR ('|' BR)* # alternation
BR -> EX (EX)* # concatenation
EX -> ITEM (Q)? # single letters plus optional quantifier
ITEM -> <any character except *>
Q -> '*'
Using this pseudocode it sort of works (cursym() returns the current
symbol - just characters here-, accept() consumes the current symbol if it
matches its argument - pretty standard functions as used in most theory RD
explanations):
RE() {
if !BR() error()
while accept("|") {
if !BR() error()
}
return 1
}
BR() {
if !EX() error()
while cursym()!="|" && !end_of_RE {
if !EX() error()
}
return 1
}
EX() {
if !ITEM() error()
if(cursym()=="*" {
if !Q() error()
}
return 1
}
ITEM() {
if cursym()=="*" error() # stray "*"
accept(cursym())
return 1
}
Q() {
accept("*")
return 1 # to be expanded in the future
}
Now I want to add parenthesized groups as ITEMs, so they can be quantified
too. To that end, I did these changes to the grammar:
RE -> BR ('|' BR)* # alternation
BR -> EX (EX)* # concatenation
EX -> ITEM (Q)?
ITEM -> '(' RE ')' | <any character except *>
Q -> '*'
And this new code (only the changed part):
ITEM() {
if accept('(') { # parenthesized group
if !RE error()
if !accept(")") error()
return 1
}
if (cursym()=="*" ||
cursym()==")" ) error() # stray "*" or ")"
# single character
accept(cursym())
return 1
}
This doesn't work because an input like "a(bc)d", which is correct, is not
parsed correctly because the ) is detected as stray ), since BR() does not
know where to stop and goes on trying to parse EXs (which in turn parse
ITEMs) even when the closing ) is reached. I could do this:
BR() {
if !EX() error()
while cursym()!="|" && !cursym()!=")" && !end_of_RE {
if !EX() error()
}
return 1
}
But it looks kind of hackish, and I'm not really sure that is the way to go.
I could change the grammar and introduce an explicit concatenation operator
for EXs (which I don't really like too much), but before doing that, can
anybody please advise?
Thank you!
[I'd implement one-token lookahead so each rule can look at a token and
if it's not something it recognizes, put the token back and return so
the upper level can do something different. -John]
Return to the
comp.compilers page.
Search the
comp.compilers archives again.