Re: object code vs. assembler code (Detailed response)

clyde@hitech.com.au (Clyde Smith-Stubbs)
Mon, 22 Feb 1993 00:32:24 GMT

          From comp.compilers

Related articles
object code vs. assembler code John_Burton@gec-epl.co.uk (1993-02-19)
Re: object code vs. assembler code byron@netapp.com (1993-02-20)
Re: object code vs. assembler code (Detailed response) clyde@hitech.com.au (1993-02-22)
Re: object code vs. assembler code (Detailed response) segfault!rfg@uunet.UU.NET (1993-03-13)
Assembly hacker vs. compiler revisited snovack@enterprise.ICS.UCI.EDU (Steven Novack) (1993-04-08)
| List of all articles for this month |

Newsgroups: comp.compilers
From: clyde@hitech.com.au (Clyde Smith-Stubbs)
Keywords: assembler, performance
Organization: HI-TECH Software, Brisbane, QLD, Australia.
References: 93-02-105 93-02-115
Date: Mon, 22 Feb 1993 00:32:24 GMT



John_Burton@gec-epl.co.uk writes:
>Obviously it is possible for a compiler to produce object code directly
>which seems to be the standard on many other systems which seems much more
>efficent to me.
>[Mostly it's a matter of taste. Some Unix assemblers, particularly the
>earlier ones, were very fast, so there needn't be a performance issue. -John]


byron@netapp.com (Byron Rakitzis) writes:
>... assembly takes up about 20% of the total time. That's not insignificant.


Well, here's a perspective from a compiler writer: the 20% mentioned that
assembly time takes up agrees with my experience, and speeding up the
assembler is a useful exercise if you want to speed up the compiler. But
compile time is these days much less of an issue than it used to be (I
speak here with experience ranging from CP/M on a Z80, through PDP11's,
Vaxes etc. right up to current Sparc, 80486 systems etc.). Compile time on
CP/M is a major issue, on a Sparc or 486 it becomes much less so. The
trend is clear: processors are getting faster at a rate outstripping the
tendency of compilers to do more work.


The issues affecting assembler vs. object code as I've seen them over the
years are these:


Firstly, my customers overwhelmingly tell me they like being able to read
the compiler output. In fact a significant effort goes into making the
output readable (formatting, source code as comments, notes about register
allocation etc.). I can recall very few occasions (none in recent years)
where a user would have happily traded this for slightly faster
compilation. This alone is a good enough reason to generate assembler code
(at least optionally).


Secondly, the assembly stage is inherently multi-pass. I know there exist
single pass assemblers, but they effectively defer work to the linker. I
believe most compilers that output object code do so the same way, i.e.
they produce lots of forward references and backward fixups (or some
equivalent thereto). This prevents the optimization of calls, branches
etc. and other things a multi-pass assembler can do.


Thirdly, and most important from my point of view, a compiler that
produces object code directly is actually producing machine language as
opposed to assembler language. Most assembler languages are a
meta-language to some degree or other. In particular, many processors
(especially CISC) have instructions that look similar at the assembler
level, but have totally different binary encodings. An assembler is
designed to handle this, but it's extra work to do so in a compiler. For
example, in the 8086 instruction set, the instructions


inc si
and
inc -4[bp],word


have totally different encodings. A compiler generating object code has to
handle this somehow, either by treating these two case as separate, e.g.
something like


i += 1;


might be matched by templates


ASPLUS register_variable constant_one -> emit(0x40 + reg(left));
or
ASPLUS memory_location constant_one -> emit(0xFF); emitmemop(left);


rather than the generic


ASPLUS lvalue constant_one -> emit("inc"); emitop(left);


OR the compiler has to examine the expression after matching and decide,
based on the operands as well, what instruction to generate. In this case
it is doing the same job as an assembler. The same applies to the 68K
where you need to worry about things move vs. moveq vs. movea. On a RISC
chip where all instructions fall into two or three formats it's less of an
issue.


None of what I've said makes it impossible to generate object code. It
does indicate (at least to me) that it is more difficult, especially if
you're dealing with a portable compiler (I've written compilers for about
12 different processor families). The benefits are small - slightly
better compile time. Using assembler output seems to me to allow the
compiler writer to concentrate more on good code generation, and less on
idiosyncracies of the instruction set encoding.


I suspect also that some compiler writers tend towards emitting object
code simply because that's what they're used to doing, or that's what they
got taught. Certainly the compiler projects I did at uni. were oriented
towards object code production, but all the real compilers I've dealt with
produce assembler code.
--
  Clyde Smith-Stubbs | HI-TECH Software, | Voice: +61 7 300 5011
  clyde@hitech.com.au | P.O. Box 103, Alderley, | Fax: +61 7 300 5246
  ...!nwnexus!hitech!clyde | QLD, 4051, AUSTRALIA. | BBS: +61 7 300 5235
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.