Re: Parsing techniques

jlilley@empathy.com (John Lilley)
1 Dec 1996 23:10:52 -0500

          From comp.compilers

Related articles
Parsing techniques lex@prl.philips.nl (1993-05-04)
Re: Parsing techniques ipser@solomon.technet.sg (1993-05-07)
Re: Parsing techniques simonh@swidev.demon.co.uk (1993-05-08)
Parsing techniques kentr@rollinssoft.com (Kent Rollins) (1996-11-26)
Re: Parsing techniques scotts@metaware.com (Scott Stanchfield) (1996-12-01)
Re: Parsing techniques jon@mauney.com (1996-12-01)
Re: Parsing techniques miano@worldnet.att.net (1996-12-01)
Re: Parsing techniques jlilley@empathy.com (1996-12-01)
Re: Parsing techniques icedancer@ibm.net (1996-12-03)
Re: Parsing techniques house@usq.edu.au (Ron House) (1996-12-07)
Re: Parsing techniques grosch@cocolab.sub.com (1996-12-09)
Re: Parsing techniques parrt@MageLang.com (Terence Parr) (1996-12-09)
Re: Parsing techniques parrt@MageLang.com (Terence Parr) (1996-12-09)
Re: Parsing techniques sjmeyer@crl.com (1996-12-15)
| List of all articles for this month |

From: jlilley@empathy.com (John Lilley)
Newsgroups: comp.compilers
Date: 1 Dec 1996 23:10:52 -0500
Organization: Empathy Software
References: 96-11-157
Keywords: parse

kentr@rollinssoft.com says...
>I am just getting into parsing and have some general questions about
>the subject. Most of the literature I have read seems to imply that
>LL(k) grammars are only for small/simple languages and that LR
>grammars must be used for thicker languages like C++. Am I misreading
>this info? Can C++ successfully be parsed with an LL(k) grammar?


Yes. I'm working with a public-domain C++ grammar that uses PCCTS, an
LL(k) generator. Although LL(k) is weaker than LALR(k), PCCTS augments
it with predicates and backtracking. PCCTS also performs intelligent
analysis when k>1 to avoid exponential explosion of the leading sets. I
have heard but cannot confirm that LALR(k>1) is almost impossible for
complex grammars. Does anyone else have experience regarding LALR(k>1)?
  I'd like to know for sure...


This still requires some hacks and tricks to pull it all together though.
  Check out "http://www.empathy.com/pccts". This is a "work in progress"
and I'm trying to get templates going at the moment. Reply via email if
you're serious about this. Currently (but not on the web site), it
collects full scope, type, and declaration information. It recognizes
function bodies but doesn't process them. It's template support is
pretty weak, and it doesn't do namespaces and exceptions. But it's
getting there. It has a full ANSI preprocessor and is getting rather
large (around 40k lines of code total).


>If so, would the resulting parser be easier to debug?


Call me stupid, but I can debug LL in my sleep, and still can't figure
out what LALR is doing :-) There is exactly one recursive descent
function generated for each grammar rule, so LL is pretty easy to trace.
The call stack contains all of the state information of the parse. In
LALR there is always the step of associating a parser state to a grammar
rule and the contents of the stack.


>Would it be easier
>to explain and recover from the errors it detects during parsing?


Not sure about error recovery. I think this is more a matter of the tool
you are using and the error-handling that it supplies, as opposed to the
parser class itself. For me, the real advantages of PCCTS are the
ability to pass around multiple values between productions, which makes
generating of symbol tables without an intermediate AST much easier.
Theoretically, a recursive-descent parser written in C++ could use C++
exceptions for error-recovery, but PCCTS doesn't support that. However,
it has its own brand of error-handling that is the functional equivalent
of "exceptions" -- but I haven't tried it yet.


>I have noticed numerous examples of ambiguous C/C++ statements posted
>in this newsgroup that cause problems for such parsers. Assuming the
>use of a lexer/parser combo, can all ambiguity be overcome or will we
>always be able to create a contrived code example that will break even
>the best C++ parser?


<soap box>
I think the latter. But the well-known ambiguities aren't the real
problem anymore, because they are at least becoming well-defined. The
ambiguities one sees, IMHO, pale in comparison to the problems that one
has in template instantiation, especially when you consider that a
fully-qualified name includes arbitrarily-complex template qualifiers,
which require template instantiation (perhaps recursively) in the middle
of deciding whether a name is a type or an identifier. OK, so maybe it
doesn't really *require* template instantiation, but it's a real mess to
parse unspecialized template classes to the extent needed. For example:


      template <class T> class C : public T { };
      class A { public : typedef int I; };
      C<A>::I i;




Is C<A>::I an identifier or a type? I don't think you can know until you
instantiate the template. This sort of thing (inheriting from a
template parameter) might be disallowed by the standard, but some
compilers accept it anyway, because they don't bother to parse the
unspecialized template.


With regards to the "traditional" ambiguities: Since PCCTS uses
predicates to disambiguate identifiers vs. types and declarations vs.
statements, the hacks are a bit easier to stomach than the usual "lexical
hack" that is performed in YACC/LEX C/C++ grammars. IMHO, the lexical
hacks become ugly as sin when applied to fully-qualified identifiers.
Predicates are at least mildly better since they reside in the parser,
which presumably is closer to the symbol tables than the lexer.
<end soap box>


john


Disclaimer: The opinions expressed here are opinions.
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.