Why do we still assemble?

hbaker@netcom.com (Henry G. Baker)
Wed, 6 Apr 1994 18:31:48 GMT

          From comp.compilers

Related articles
Why do we still assemble? jimcamel@rogers.com (Jim Camelford) (2006-10-20)
Re: Why do we still assemble? idknow@gmail.com (idknow@gmail.com) (2006-10-21)
Why do we still assemble? hbaker@netcom.com (1994-04-06)
Re: Why do we still assemble? djohnson@arnold.ucsd.edu (1994-04-07)
Re: Why do we still assemble? jpab+@andrew.cmu.edu (Josh N. Pritikin) (1994-04-07)
Re: Why do we still assemble? preston@noel.cs.rice.edu (1994-04-07)
Re: Why do we still assemble? Nand.Mulchandani@Eng.Sun.COM (1994-04-07)
Re: Why do we still assemble? pardo@cs.washington.edu (1994-04-08)
Re: Why do we still assemble? pardo@cs.washington.edu (1994-04-08)
[34 later articles]
| List of all articles for this month |

Newsgroups: comp.compilers
From: hbaker@netcom.com (Henry G. Baker)
Keywords: assembler, design, comment
Organization: Compilers Central
Date: Wed, 6 Apr 1994 18:31:48 GMT

Other than sheer institutional inertia, why do we continue to compile into
assembler code, then assemble the code into relocatable binary? Why don't
we compile directly to relocatable binary and be done with it?

Before answering, I've already considered the following reasons.

Pro: People still want to write assembly code, and therefore you have to
provide an assembler anyway. That having been done, it's easier to
program and maintain a compiler which then outputs to assembly code.

Con: Nowadays, you have to provide a C compiler anyway, so the assembler
types can use _asm directives in the C code when they feel the urge (which
the C compiler can easily handle). There's still no need to define a
separate assembly language.

Pro: Modularity requires that the information about the gory details of
the instruction format should be located in one place. An assembler is a
good place to put this information.

Con: Modularity can just as easily be obtained by providing an 'asm'
function/procedure which returns the assembled instruction(s) for the C
compiler to output. (Probably uses some kind of a call-back.)

Pro: You can use the same C compiler but different assemblers for
different object file formats.

Con: You can use the same C compiler but different output libraries that
hide the object file format.

Pro: The C compiler generates all these temporary symbols, but doesn't
have to keep them throughout the compilation. This saves space during the
compilation, but requires that the assembler have a large symbol table.
Thus, the scheme minimizes the maximum space consumption.

Con: This argument is, of course, pure hogwash for today's virtual memory
systems. Furthermore, the business about the assembler having to keep
these symbols around shows an abysmal lack of sophistication among
assembly language designers, who never seem to have heard about block
structured assemblers.

Pro: On machines like the MIPS, assemblers are _smart_, and do important
things like instruction scheduling.

Con: This scheme may work for really dumb compilers, but any optimizing
compiler is going to have to go through the same analysis. So modularity
is violated, since the same job is now being done twice.


Con: The compiler has to write out a very large file
character-by-character. Worse still, the assembler has to read this file
in character-by-character and then _reparse_ it. The cost of all of this
character hacking, file I/O, and parsing rivals and usually exceeds the
entire cost of compiling and assembling.

Pro: You can Unix pipe the two...

Con: Get a life! You still have to go through printf and whatever
miracles LEX/YACC perform at the assembler input.


The net result of all of this is the current C compilers, with the notable
exceptions of the Think C compilers on the Mac (and others of similar ilk)
are slower than the Algol compiler on the 7094 in the early 1960's.

So much for progress.

[On Unix systems it's standard for compilers to produce assembler which is
then assembled, but I don't know of production compilers on other systems
that do that. Back at the dawn of Unix history (well, 1974 or so) due to
address space limitations the C compiler ran in multiple passes of roughly
12K apiece. The first pass turned C into polish glop, the second pass
turned the glop into assembler source, and the third pass was the
assembler. Given that they needed multiple passes, this was a fine way to
divide things up. Many people forget that the Unix pdp-11 assembler was
extremely fast; it read the source, produced the symbol table and a
tokenized intermediate file, then reread the intermediate file and
generated the a.out. The assembler was written in assembler, none of that
wussy lex and yacc junk. Subsequent Unix compilers have kept the same
design, but haven't paid attention to compile-time performance. It's
entirely possible to create a fast C compiler. Turbo C runs real fast
(much faster than its ancestor Wizard C) using straightforward techniques
like buffering include files and object files in RAM. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.