Re: Modularize compiler construction?

"Ira Baxter" <idbaxter@semdesigns.com>
Thu, 28 Jan 2010 09:20:20 -0600

          From comp.compilers

Related articles
Modularize compiler construction? pengyu.ut@gmail.com (Peng Yu) (2010-01-23)
Re: Modularize compiler construction? kkylheku@gmail.com (Kaz Kylheku) (2010-01-24)
Re: Modularize compiler construction? cr88192@hotmail.com (BGB / cr88192) (2010-01-24)
Re: Modularize compiler construction? cr88192@hotmail.com (cr88192) (2010-01-25)
Re: Modularize compiler construction? DrDiettrich1@aol.com (Hans-Peter Diettrich) (2010-01-25)
Re: Modularize compiler construction? pengyu.ut@gmail.com (Peng Yu) (2010-01-25)
Re: Modularize compiler construction? idbaxter@semdesigns.com (Ira Baxter) (2010-01-28)
Re: Modularize compiler construction? gneuner2@comcast.net (George Neuner) (2010-01-28)
Re: Modularize compiler construction? ott@mirix.org (Matthias-Christian Ott) (2010-01-31)
Re: Modularize compiler construction? cr88192@hotmail.com (BGB / cr88192) (2010-01-31)
Re: Modularize compiler construction? gneuner2@comcast.net (George Neuner) (2010-02-01)
Re: Modularize compiler construction? sh006d3592@blueyonder.co.uk (Stephen Horne) (2010-02-08)
| List of all articles for this month |

From: "Ira Baxter" <idbaxter@semdesigns.com>
Newsgroups: comp.compilers
Date: Thu, 28 Jan 2010 09:20:20 -0600
Organization: Compilers Central
References: 10-01-080 10-01-082 10-01-086
Keywords: design
Posted-Date: 31 Jan 2010 01:10:36 EST

"cr88192" <cr88192@hotmail.com> wrote in message
> "Kaz Kylheku" <kkylheku@gmail.com> wrote in message
>> On 2010-01-23, Peng Yu <pengyu.ut@gmail.com> wrote:
>>> It seems that the current compiler construction tools (at least in
>>> bison and flex) are still very primitive. Let's take the following
>>> example to explain what I mean.
>>
>> bison and flex are not tools for /complete/ compiler construction.
>
> yep.


> however, given as much emphasis as so many people seem to put on parser
> generators, it is not so hard to see how some people could be misled into
> thinking that the parser IS the compiler.


Parser generators are only a small part of tools one would want for
compiler construction. Why people beleive that is a significant part
is a complete mystery to me, especially if they've had a compiler
class or attempted to build a compiler.


Well, there are toolsets that are designed to support compiler
construction. The New Jersey Machine toolkit comes to mind. There's
a quite good list at http://catalog.compilertools.net/.


> BNF, sort of like traditional math, and I also suspect SSA,
> ... represent a "mode of thinking" that I personally don't seem
> particularly skilled with (the great oddity is that programming is
> not particularly difficult, but I am apparently
> mathematically-unskilled, and find that many "formalisms" seem to
> defy understanding...).


They don't and they are well worth the trouble. Formal notations
with precise meaning lead to tools which lead to huge leverage.


>> Bison does not provide the semantics of translation, only a way to
>> build a parser, which is far, far from a complete translation scheme.
>> It can be argued that a parser-generation tool /should/ only do that
>> one job.


Er, was there ever an argument? I don't think so.


> sadly, it does not take one long to fnd, if implementing a compiler
> for a non-trivial language (such as the main C-family languages),
> that the parser is not really the big source of complexity (but, at
> least with C and C++, it can be a big source of slowness, but this
> is not quite the same issue...).


The *language* C++ is the "source of complexity". The problem of parsing
it has been killed dead several times. You can do it (clumsily) with
LALR parsers and tangling of symbol tables and parsing actions.
You can do it by recursive descent and similar tangling (GCC, I think EDG).
You can do it using GLR parsers, with *complete* separation
of parsing and symbol table collection (our DMS Software Reengineering
Toolkit does this), which means you can write a grammar that almost
mirrors the reference grammar.


Further, AFAIK, GLR parsers aren't necessarily slow. (The DMS parser
isn't particularly fast, but then we capture comments and source
format information as we go). Adrian Johnstone and his students have
done a bunch of work on tuning GLR parsers. I think it was Scot
McPeak that implemented an optimization that makes GLR parsers run
like LALR(1) [in terms of computation effort] in the highly frequent
case that there are not multiple parses in a particular part of the
grammar. And Pennelo showed how to "compile" LALR parsers to machine
code for extremely fast parsing ("Very fast LR parsing", ACM Sigplan
July 1986). [We'll get around to composing this set of ideas
someday]. Once you do this, much of the front end slowness is in
lexer.


>> The GNU project does have a much more complete compiler construction
>> suite: ..
>
> yes, however GCC does have a few down-sides:
> like many GNU projects, it is often very difficult to get it to build on
> Windows, and so help you if you want to build it with MSVC...
> the code is large, and the code is nasty.


That's all relative. You're dealing with a complicated beast, whose
complications are driven the complications of the langauge(s) and
multiple targets they are trying to manage. You can get much simpler
beasts, but you won't be happy with them because they won't address
real langauges or real targets. And GNU shows its roots: it grew
organically from an early compiler. It suffers somewhat from not
being designed to modular from the very beginning.


There are more organized attacks on building compiler-like tools. The
DMS Software Reengineering Toolkit
(http://www.semanticdesigns.com/Products/DMS/DMSToolkit.html) is a set
of tools for defining lexers, parsers with automatic tree building and
built-in error recovery, attribute evaluators, symbol tables, control
flow analysis, local data flow analysis, local and global points to
analysis, source-to-source program transformations, prettyprinting,
... And like GCC, it is attempting to handle many languages (and it
does: C, C++, Java, C#, COBOL, PHP, ...) And its pretty complicated,
too, because the machinery it implements is designed to support the
union of the complications from all these langauges; (you can't do C++
unless your symbol table machinery can handle all of its pecadillos).
I'd like to think that DMS is "cleaner" than GCC, but we aren't trying
to build high performance compilers (yet!). So there's hope yet :-}




--
Ira Baxter, CTO
www.semanticdesigns.com


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.