RE: Lazy/tolerant parsers

Quinn Tyler Jackson <quinn-j@shaw.ca>
14 Jul 2004 12:07:03 -0400

          From comp.compilers

Related articles
Lazy/tolerant parsers mritun@gmail.com (2004-07-13)
RE: Lazy/tolerant parsers quinn-j@shaw.ca (Quinn Tyler Jackson) (2004-07-14)
| List of all articles for this month |
From: Quinn Tyler Jackson <quinn-j@shaw.ca>
Newsgroups: comp.compilers
Date: 14 Jul 2004 12:07:03 -0400
Organization: Compilers Central
References: 04-07-029
Keywords: parse
Posted-Date: 14 Jul 2004 12:07:03 EDT

Akhilesh


> I tried searching in archives, but could not find helpful results.
>
> I have a (Ada like) grammar for which I need to create a parser which
> should be usable for code as user types (syntax highlighting, code
> assist etc). So the parser should correctly deal with -
>
> - partial sententences
> - Incomplete/missing closures
>
> Which method would you recommend to write such a parser ?


A-BNF (Adaptive BNF, not ABNF = Augmented BNF) has several features were
added to allow for what I've called "shallow parsing" (although strictly
speaking, shallow parsing is probably not the correct term).


First, it has the "usual" #anchor and #synch constructs. (@(expr) = anchor,
$$ = synch)


It also now as a #scan construct that causes a production to skip ahead
until it finds a match. So:


grammar X
{
S ::= @("(") b $$ ")";
b ::= /* something */;
};


If b fails to match, a "recovery" node will be placed on the parse tree that
has all the skipped junk in it.


grammar X
{
S ::= "(" b;
b ::= #scan ")"; // skip over everything until ")" is hit
};


#scan may seem like the regular expression '.*' but in fact behaves slightly
differently internally.


I don't know how useful either of those two constructs are for overall error
recovery. The main problem with anchor/synch is that it requires the
synchronizing portion to be in the same production as its anchor in order to
work. (At least they do the way I implemented them.)


Another "trick" in A-BNF is to use the ... operator (also thing):


grammar X
{
S ::= "(" b| b ::= /* something */;
};


If when talking about "lazy" you mean "don't bother unless you're sure"
parsing, delayed predicates can be used in A-BNF for this:


grammar X
{
S ::= "(" $x( b ::= /* something */;
};


This is a whole 'nother ball of wax. Whatever falls between the "(" and ")"
will be glossed-over until the closing ")" is definitely encountered. If the
production b were terribly expensive, it wouldn't be applied until the ")"
was absolutely encountered. In this way, one can do a "light" (forgiving)
parse that doesn't check for certain things, then, iff the "forgiving" parse
determines that the likelihood is that the stricter parse of the predicate
won't fail -- the more expensive parse can be applied. (For instance, the
"forgiving" parse might parse the "form" of C++ templates, whereas the
stricter parse might enforce expensive dynamic table look ups on the
templates parameters.)


Hope that isn't completely useless information.


--
Quinn Tyler Jackson
http://members.shaw.ca/qjackson/


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.