Related articles |
---|
Lazy/tolerant parsers mritun@gmail.com (2004-07-13) |
RE: Lazy/tolerant parsers quinn-j@shaw.ca (Quinn Tyler Jackson) (2004-07-14) |
From: | Quinn Tyler Jackson <quinn-j@shaw.ca> |
Newsgroups: | comp.compilers |
Date: | 14 Jul 2004 12:07:03 -0400 |
Organization: | Compilers Central |
References: | 04-07-029 |
Keywords: | parse |
Posted-Date: | 14 Jul 2004 12:07:03 EDT |
Akhilesh
> I tried searching in archives, but could not find helpful results.
>
> I have a (Ada like) grammar for which I need to create a parser which
> should be usable for code as user types (syntax highlighting, code
> assist etc). So the parser should correctly deal with -
>
> - partial sententences
> - Incomplete/missing closures
>
> Which method would you recommend to write such a parser ?
A-BNF (Adaptive BNF, not ABNF = Augmented BNF) has several features were
added to allow for what I've called "shallow parsing" (although strictly
speaking, shallow parsing is probably not the correct term).
First, it has the "usual" #anchor and #synch constructs. (@(expr) = anchor,
$$ = synch)
It also now as a #scan construct that causes a production to skip ahead
until it finds a match. So:
grammar X
{
S ::= @("(") b $$ ")";
b ::= /* something */;
};
If b fails to match, a "recovery" node will be placed on the parse tree that
has all the skipped junk in it.
grammar X
{
S ::= "(" b;
b ::= #scan ")"; // skip over everything until ")" is hit
};
#scan may seem like the regular expression '.*' but in fact behaves slightly
differently internally.
I don't know how useful either of those two constructs are for overall error
recovery. The main problem with anchor/synch is that it requires the
synchronizing portion to be in the same production as its anchor in order to
work. (At least they do the way I implemented them.)
Another "trick" in A-BNF is to use the ... operator (also
grammar X
{
S ::= "(" b|
};
If when talking about "lazy" you mean "don't bother unless you're sure"
parsing, delayed predicates can be used in A-BNF for this:
grammar X
{
S ::= "(" $x(
};
This is a whole 'nother ball of wax. Whatever falls between the "(" and ")"
will be glossed-over until the closing ")" is definitely encountered. If the
production b were terribly expensive, it wouldn't be applied until the ")"
was absolutely encountered. In this way, one can do a "light" (forgiving)
parse that doesn't check for certain things, then, iff the "forgiving" parse
determines that the likelihood is that the stricter parse of the predicate
won't fail -- the more expensive parse can be applied. (For instance, the
"forgiving" parse might parse the "form" of C++ templates, whereas the
stricter parse might enforce expensive dynamic table look ups on the
templates parameters.)
Hope that isn't completely useless information.
--
Quinn Tyler Jackson
http://members.shaw.ca/qjackson/
Return to the
comp.compilers page.
Search the
comp.compilers archives again.