RE: simple vs. complex parsers

Quinn Tyler Jackson <qjackson@shaw.ca>
23 Sep 2003 13:01:27 -0400

          From comp.compilers

Related articles
RE: simple vs. complex parsers qjackson@shaw.ca (Quinn Tyler Jackson) (2003-09-23)
| List of all articles for this month |

From: Quinn Tyler Jackson <qjackson@shaw.ca>
Newsgroups: comp.compilers
Date: 23 Sep 2003 13:01:27 -0400
Organization: Compilers Central
Keywords: parse, design
Posted-Date: 23 Sep 2003 13:01:26 EDT
In-reply-to: 03-05-103

Some time ago (back in May), Robert A Duff said to Chris Clark:


> I agree with most of your points. I do admit that parser
> technology has helped us understand what constitutes
> a good syntax for a programming language.
>
> I actually have mixed feelings. On the one hand, I'm in favor of
> automating as much as we reasonably can. On the other
> hand, when I see parser generators that have complex features
> in order to support dealing with the typedef problem in C, I feel
> like something's wrong with the world.


A grammar formalism needn't be all that complex to (theoretically)
handle any decidable language.


For instance, it need only be BNF style grammar with the addition of
substring predicate filtering, and a form of variable I have called
name-indexed tries.


WIth the addition of these two features, the grammar becomes Turing
Powerful, and thus can parse any parseable language, and once someone
becomes proficient in the use of these additions, it doesn't require
book-sized grammars to do so. (My Perl $-grammar has 109 productions,
my C# $-grammar has 292 productions, and my C++ $-grammar has 303
productions. Perl and C++ are notoriously nasty to parse, but I
wouldn't say that 303 productions is all that much. Granted, these are
experimental grammars, but anyway....)


What can happen is that someone (like me -- this describes how I used
to think) can come along and spot a difficult to parse construct and
say, "If I just add this augmentation to system X, I can neatly and
cleanly parse nasty construct Y."


Without adequate formal investigation of each and every augmentation,
one risks breaking the machine. It's quite possible, using the
augmentations I mention above to write a legal Meta-S grammar that,
given some input Z, never halts. I can write such a non-halting but
legal grammar in 4 productions:


S ::= A ":" (B | C);
A ::= $x<-(["a"]);
B ::= A.x B;
C ::= "c";


Given the input "a:az" -- the grammar is legal. -- it rejects that
language.
Given the input "a:c" -- the grammar is legal -- it accepts that
language.


Given the input ":b" -- the grammar never halts (kaboom), since the
adaptive(k) parsing algorithm that is applied to $-grammars cannot
handle left recursion, and if A.x expands to lambda, B becomes a left
recursive rule. B ::= B;


The point I'm trying to make is that just adding features to deal with
specific hard-to-parse features of some-such language is not, IMO, the
way to do it. Every addition must be investigated, and overly
dangerous features must be either rejected, or some method of warning
the user must be designed. For instance, the above grammar, fed into
the Meta-S IDE, will show a small lambda symbol beside production
A --- indicating that caution should be used with any production
referencing A.x.


So, perhaps the answer to the featuritis problem is simply to
encourage grammar-augmenters to fall back on formal methods of
investigation when considering augmentations.


--
Quinn


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.