Re: parsing tools, was Dragon Book - update necessary?

Randall Hyde <rhyde@cs.ucr.edu>
31 Oct 2000 14:33:16 -0500

From comp.compilers

Related articles
Dragon Book - update necessary? predictor@my-deja.com (Pred.) (2000-10-08)
Re: Dragon Book - update necessary? rhyde@cs.ucr.edu (Randall Hyde) (2000-10-10)
Re: Dragon Book - update necessary? LLkParsing@aol.com (2000-10-12)
Re: Dragon Book - update necessary? rhyde@cs.ucr.edu (Randall Hyde) (2000-10-15)
Re: Dragon Book - update necessary? bruce@hoult.org (Bruce Hoult) (2000-10-19)
Re: Dragon Book - update necessary? rhyde@cs.ucr.edu (Randall Hyde) (2000-10-23)
Re: parsing tools, was Dragon Book - update necessary? LLkParsing@aol.com (2000-10-26)
*Re: parsing tools, was Dragon Book - update necessary? rhyde@cs.ucr.edu (Randall Hyde)* (2000-10-31)**
Re: parsing tools, was Dragon Book - update necessary? ed_davis@my-deja.com (Ed Davis) (2000-11-01)
Re: parsing tools, was Dragon Book - update necessary? jim.granville@designtools.co.nz (Jim Granville) (2000-11-01)
Re: parsing tools, was Dragon Book - update necessary? iank@idiom.com (2000-11-01)
Re: parsing tools, was Dragon Book - update necessary? jmochel@foliage.com (2000-11-01)
Re: parsing tools, was Dragon Book - update necessary? joachim_d@gmx.de (Joachim Durchholz) (2000-11-01)
Re: parsing tools, was Dragon Book - update necessary? LLkParsing@aol.com (2000-11-01)
[3 later articles]

| List of all articles for this month |

From:	Randall Hyde <rhyde@cs.ucr.edu>
Newsgroups:	comp.compilers
Date:	31 Oct 2000 14:33:16 -0500
Organization:	Posted via Supernews, http://www.supernews.com
References:	00-10-061 00-10-067 00-10-093 00-10-109 00-10-130 00-10-193 00-10-209
Keywords:	parse, tools
Posted-Date:	31 Oct 2000 14:33:16 EST

> I think that writing a compiler of that size in recursive descent would
> be a maintenance nightmare.

So was writing a compiler of that size with Flex/Bison :-)
I predict the C++ R.D. version would be half the size.

> ANTLR generates the recursive descent code
> for you from a grammar specification. However, as you painfully
> discovered, you will be married to the tool, for better or for worse.

ANTLR is at the top of my list of tools to consider. I do have a couple
of concerns I would like answers to though:

(1) I would prefer to write my lexer in assembly language (gasp!).
        HLA spends the vast majority of its time in lexical analysis and
        due to the interpreter I've integrated into the compiler (producing
        what I call the "compile-time language") there is a very incestuous
        relationship between the parser and lexer (the parser calls the lexer
        which recursively calls the parser, which recursively calls the lexer,
        etc.). I'm using this bit of craziness to support on-the-fly expansion
        of HLA's macros and other features (a preprocessor will *not* work
        here). So how hard is it to ignore ANTLR's lexer generator capabilities
        and call one or more different lexers written in another language (e.g.,
        assembly)?

(2) Is it easy to create and merge multiple parsers into the same compiler?
        Remember, I need a compiler and an interpreter in the same package.
        Merging the two grammars together, as I had to do with Flex/Bison
        created a lot of problems for me (yes, I realize I could have created
        two parsers with Bison, but I was never able to get some simple test
        systems working to my satisfaction).

(3) Being written in Java concerns me. What is the performance like?
        Granted, it's got to be better than running 100 KLOC through Bison
        (which generally takes about a minute on my machine), but I would
        really like better turn around on the compiles with the next version.

(4) How large a system can you compile with ANTLR? Although not using
        Bison will allow me to reduce my grammar considerably, it's still a
        large language. HLA has several hundred statements, each with it's
        own syntactical variations. The original versions of FLEX and Bison
        that I started with choked early on (first I ran out of table space,
        then the unsigned shorts used as indexes into the tables started
        overflowing, etc.). I had to modify Flex and Bison to handle the
        large grammar I'd produced. Alas, I was left behind as newer versions
        of Flex and Bison arrived because I didn't want to have to go in and
        make all those changes to the newer versions of the tools. This was
        a real problem with FLEX (you should see the ugly algorithm, involving
        unget, I use to expand macros on the fly; a problem that was
        eliminated in later versions of Flex that I could not use).
>
> I would also point out that a recursive program tends to be much slower
> than an iterative one. A few years back, I compared my table-driven C
> recognizer to the one provided with ANTLR. Mine was about three times
> faster when executed on the ANTLR source code itself.

Parsing consumes a tiny fraction of the time compared with scanning (at
least, in HLA). Even with the truly disgusting lexer I currently have,
compile times are usually quite reasonable. I'm not too worried about
the minor differences in the parse times. Besides, the only place recursion
really kicks in is within complex arithmetic expressions. By using a
bottom-up parser (say, operator precedence) here, you can speed this up
if it turns out to be a problem. Once I fix the problems with HLA's
lexer, performance will be very good.

>
> If you send me a non-flattened, actionless version of your grammar, I
> will run it through my parser-generator to see if the language is strong
> LL(k), for some reasonable value of k.

At various times I've had offers to fix the problems with my grammar.
However, HLA v1.x is a prototype system, it already works (well, sort
of...), and has served its purpose. The grammar issue is just one of
about a 1,000 different problems I need to address in the next version,
so I wouldn't want to waste anyone's time trying to improve the prototype.

My main concern at this point is "should I write it manually or should I
use a tool?" The obvious answer to someone not familiar with HLA is to
learn and use a better tool. As I've said, I'm considering ANTLR and
my browser has links to lots of other compiler tool pages.

However, when I first started writing HLA v1.0 I was *sure* Flex
and Bison were the proper tools. It took me a couple of years to
discover than they were not the appropriate tools to use. I realize
that hand-coding a parser makes the maintenance effort greater and that
using a tool will let me change things around much easier (once I master
the tool, that is). However, the whole purpose of v1.x was to determine
what I would really need in v2.x. While there will definitely be some
maintenance issues that come up, most if the "what-if" scenarios have
already taken place. I don't want to claim that v2.x is just a coding
project, but most of the major design issues are already settled.
That means that I have a pretty good idea of what needs to be done
if I write HLA in a language like C++ or object Pascal (or even HLA,
for that matter). I'm not at all sure what the issues will be if I
switch to a tool like ANTLR. This is a traditional software engineering
risk management problem: do I go with the tried and true mechanism that
is safe, or do I try to use a tool and hope that the tool is appropriate?
On the one hand, ANTLR (or some other tool) could save a lot of effort.
OTOH, the tool may turn out to be inappropriate and I get locked into
using the tool or waste a couple of years of my life and have to start
over using the 3rd generation language approach.

Whatever, the actual work on V2.0 is still about a year away, so
I've got lots of time to mull over this decision (and, perhaps,
play around with some tools in the meantime).
Randy Hyde

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: parsing tools, was Dragon Book - update necessary?

Randall Hyde <rhyde@cs.ucr.edu>31 Oct 2000 14:33:16 -0500

Randall Hyde <rhyde@cs.ucr.edu>
31 Oct 2000 14:33:16 -0500