Re: Recursive descent parsing / backtracking

Terence Parr <parrt@everest.ee.umn.edu>
Mon, 31 Oct 1994 22:32:43 GMT

          From comp.compilers

Related articles
Q: Recursive Descent w/Backtracking SCHMIDTG@iccgcc.cs.hh.ab.com (1994-10-21)
Re: Recursive descent parsing / backtracking parrt@everest.ee.umn.edu (Terence Parr) (1994-10-31)
| List of all articles for this month |

Newsgroups: comp.compilers
From: Terence Parr <parrt@everest.ee.umn.edu>
Keywords: parse, LL(1)
Organization: Compilers Central
References: 94-10-151
Date: Mon, 31 Oct 1994 22:32:43 GMT

Greg Schmid writes about backtracking:


> S -> cAd
> A -> a | ab
>
> It would be able to correctly parse the sentences of this language.
>
> One pitfall with RDB is that if not implemented properly, the order of
> productions can affect whether or not a solution is found. For example,
> consider the input string w = cabd with the above grammar. A naive
> implementation might fail as follows:
>
> S -> cAd -> cad (REJECT)
> [...]
> This type of backtracking over a tree is difficult to implement using
> recursive precedures to represent each production, since if we had a
> procedure called 'A' which attempts to match either 'a' or 'ab', we
> would have to remember that we called 'A' and then call 'A' again
> telling it this time to ignore the first alternative.


Greg, don't worry about this contrived case. Furthermore, backtracking is
needed in only a few cases and mainly for nasty languages like C++. Unless
you have a totally wacko language to parse, don't use a purely backtracking
parser as it has potentially exponential time complexity. If you do want a
purely backtracking parser, there are boatloads out there and you could
write a simple one in a day or two. You could also try a Tomita parser that
makes forests of bottom-up parse trees and then picks the "best" one.


Alternatively, let me make a shameless plug for ANTLR (the parser generator
in PCCTS). ANTLR would handle this one without backtracking because it's
simply LL(2). That is, in rule A, the parser would see input "ab" or "ad"
rather than just input "a"--which would allow it to correctly
(deterministically) predict A->a or A->ab. ANTLR will do backtracking also
if you want, however, it still won't solve your specific example without a
grammar rewrite. Even if we restrict ANTLR to LL(1), it could still solve
your example with a "syntactic predicate" and a rule instantiation:


s : C
( (A D)?
| A B D
)
;


The (...)? is a predicate that indicates you are not sure if the enclosed
construct will match. If it does not, simply try the next viable
alternative. The (A D|A B D) is a simple EBNF subrule (rule without a
label). This selective backtracking is extremely useful in practice; just
ask the NeXT folks doing the combined C/Objective-C/C++ parser with ANTLR.
The PCCTS ftp site is everest.ee.umn.edu in pub/pccts. Our newsgroup
is comp.compilers.tools.pccts.


Best regards,
Terence Parr
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.