Re: Bottom-up versus Top-down

dwight@pentasoft.com (Dwight VandenBerghe)
2 Dec 1997 12:11:36 -0500

          From comp.compilers

Related articles
[3 earlier articles]
Re: Bottom-up versus Top-down gclind01@spd.louisville.edu (1997-11-29)
Re: Bottom-up versus Top-down mkgardne@cs.uiuc.edu (1997-11-30)
Re: Bottom-up versus Top-down henry@zoo.toronto.edu (Henry Spencer) (1997-11-30)
Re: Bottom-up versus Top-down rod.bates@wichita.boeing.com (Rodney M. Bates) (1997-12-02)
Re: Bottom-up versus Top-down henry@zoo.toronto.edu (Henry Spencer) (1997-12-02)
Re: Bottom-up versus Top-down thetick@magelang.com (Scott Stanchfield) (1997-12-02)
Re: Bottom-up versus Top-down dwight@pentasoft.com (1997-12-02)
Re: Bottom-up versus Top-down neitzel@gaertner.de (1997-12-05)
Re: Bottom-up versus Top-down jmccarty@sun1307.spd.dsccc.com (1997-12-05)
Re: Bottom-up versus Top-down sperber@informatik.uni-tuebingen.de (1997-12-07)
Re: Bottom-up versus Top-down henry@zoo.toronto.edu (Henry Spencer) (1997-12-07)
Re: Bottom-up versus Top-down bromage@cs.mu.oz.au (1997-12-07)
Re: Bottom-up versus Top-down jeffj@csm.astate.edu (Jeff Jenness) (1997-12-10)
[2 later articles]
| List of all articles for this month |

From: dwight@pentasoft.com (Dwight VandenBerghe)
Newsgroups: comp.compilers
Date: 2 Dec 1997 12:11:36 -0500
Organization: All USENET -- http://www.Supernews.com
References: 97-11-123 97-11-155 97-11-178
Keywords: parse

mkgardne@cs.uiuc.edu (Mark K. Gardner) wrote:
> What I am asking for is a head-to-head comparison between the two
> techniques on the same (hopefully non-trival) grammar conducted in a
> reasonably scientific manner.


The test you need isn't a test of LL/LR parsing techniques, but a test
of implementations. There is no inherent reason that a table-driven
technique is faster than a hand-coded one ... think about it. You
have to access the table, don't you? If you could just write the code
itself directly, you'd save the table accessing. In fact, there are
efficiency reasons to choose top-down, if that's really the choice
criteria. You can code a recursive-descent parser directly from an LL
BNF grammar, in just a few minutes (after some practice). And there
have been tools (LILA from the Europeans, years ago, and many master's
theses) that will automatically produce directly- compilable source
code, recursive-descent style, from BNF. No big deal.


In fact, in the olden days (early to mid seventies) we were suspect of
any table-driven schemes because of their overhead. The original
Ritchie C compiler, and all of the excellent Whitesmiths C compilers,
used recursive-descent hand-coded front ends. It wasn't until Steve
Johnson came out with yacc that there was a decent table-driven parser
generator that was widely available and that worked well enough to
make it into the mainstream.


Remember, though, that it was not yacc's LALR-ness that made it
useful. It was a combination of three wonderful features: the ability
to embed C and get full sources out; the ability to use L-attributed
synthesized attributes ($$, $1, and so forth); and finally the
incredible (for the time) disambiguation of expressions using explicit
precedence and associativity (%left, %right, %prec). If he had made
an LL1 parser generator with the same features, I think it would have
been nearly as successful.


The argument that you need LALR(1) for C-like languages is,
unfortunately, not correct. It's the other way around. You have to
convolute the LALR grammar quite a bit to handle C. Ritchie wrote the
recursive-descent parser and developed the language at the same time,
so it is actually easier to follow in his footsteps when you get down
to the nitty-gritty. It's easier to form the symbol table, get the
types right, and so on, if you use recursive descent. You have to use
some look-ahead and so forth, but the timing of the sremantic routines
works out better than with a yacc-generated parser. That's why
Plauger used the same approach in his Whitesmiths compilers.


Another little-known fact is that it has been shown that, using
special techniques, recursive descent is LR(k). That's right - not
LL(1), but LR(k). Tom Penello told me this in the mid eighties at a
compiler conference, and he was right - the reference is in an old
TOPLAS, I think. Penello is MetaWare (with DeRemer) and as these guys
wrote the book on LR parsing, I listen when they talk. Tom said that
he thought that there were just two reasons to use a parser generator
instead of recursive descent: to make sure that there were no hidden
ambiguities in the grammar, and to separate out the semantic routines
from the parsing code, which otherwise can get sort of tangled up.
But parsing power was not one of his reasons; nor was speed.


Speed really doesn't matter. Parsing takes up such a small amount of
the total time of the compilation that it's almost down in the noise.
Last time I measured it took about 6%, in a custom language compiler.
If you were to double it, nobody would probably notice. If you were
to spend you life finding techniques to reduce it to nearly zero,
still nobody would probably notice. A huge amount of time is spent in
- surprise - the lexer; up to 40% for some compilers. You are better
off making big efforts here than in finding the fastest parser. For
example, one of the best cheap tricks of the trade is to hack the
lexer to read the entire source file into memory in one big gulp at
the beginning of the run; this can save 10-20% of the runtime, in some
cases.


As far as LL(1) goes, it's a little weak. You have to futz around
with the grammar to get it to handle some common cases, and even then
you sometimes hit the wall. I don't use it because of that problem.
I do, however, use recursive descent all the time. I have control, so
I can look ahead all I want into the input stream, making it
essentially LL(k), which is a whole lot better. I can also use my
semantic data structures - the symbol table, the AST, the current
context - to help guide my parsing decisions, which is the way to get
as powerful a parser as you need. If I were starting over I might
become a PCCTS user - it is LL, and it uses semantic predicates to
disambiguate rules when needed. But there is no need to do this, once
you have the recursive descent techniques mastered. You get the same
power - albeit at the cost of not being able to formally verify your
grammar.


Dwight
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.