Re: How to generate object code

Chris F Clark <cfc@shell01.TheWorld.com>
Wed, 24 Oct 2007 07:09:00 -0400

          From comp.compilers

Related articles
How to generate object code mudgen@gmail.com (Nick Mudge) (2007-10-21)
Re: How to generate object code santosh.k83@gmail.com (santosh) (2007-10-22)
Re: How to generate object code gah@ugcs.caltech.edu (glen herrmannsfeldt) (2007-10-21)
Re: How to generate object code eliotm@pacbell.net (Eliot Miranda) (2007-10-23)
Re: How to generate object code cfc@shell01.TheWorld.com (Chris F Clark) (2007-10-24)
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: Wed, 24 Oct 2007 07:09:00 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 07-10-064 07-10-074
Keywords: code, assembler
Posted-Date: 24 Oct 2007 12:15:47 EDT

Eliot Miranda <eliotm@pacbell.net> writes:


> As John says there are examples either way. But I strongly assert
> that the _only_ right way to do it is to produce assembler and
> assemble this separately? Why?
...
> Microsoft compiler could be asked to spit out assembly code, the
> assembly that it did produce was incorrect. If one assembled the
> output it would only match the directly-compiled object code for
> trivial cases.


Good advice. Correctness is usually more important (and more subtle)
than people realize. Along this same line, being able to write out
and read back in (prefereably in in a human readable form, even if you
take that to mean XML) the intermediate representation of your
compiler (and traces of the steps it is taking) can be invaluable.


Here are a couple of examples:


I'm working on a new machine architecture to accelerate a specific
problem and it has a "compiler" that generates a binary image for
loading into the hardware. (I need to take Eliots advice to heart and
create an assembler for the architecture and use that internally to
the compiler.) However, in the meantime the compiler can generate
internal dumps of its representation of the problem (at each step of
the process). For the last month or so I have been significantly
re-writing some of those internals. In the process, I've been using
the dumps and diffing old version to new to guide me as I make changes
to verify that I haven't accidentally caused the compiler to behave
differently.


Similarly, the simulator that runs the new machine model (using the
same data structures), has a "micro-code" model of how it works. When
executing the machine, one can capture a trace of the micro-code
steps, and differencing the old and new versions of that assure me
that the new internal representation, not only "looks the same", but
behaves exactly identically.


It is worth emphasizing that both of those internal dumps are done in
a human readable format and that has been invaluable in the debugging
process. Not only can I use the standard tool diff to compare the
results of one version to the next, but more importantly when I do
have a discrepancy, I can actually look at the dumps and see exactly
where in the code I need to stop and what values I should be looking
at to determine what mistake I've made.


There is a similar example in Yacc++. As one might guess, Yacc++
generates C++ models of lexer and parsers. (And, it actually can
generate C# and C models also.) And, since the tool has been around
for about 10 years and has gone through several revisions both major
and minor, the code it generates has evolved over time. Recently, it
has been a project to make the code both forward and backward
compatible. That is you can use the latest copy of the generator and
give it a revision switch and it will attempt to generate the same
code that it generated at that revision. To get that to work, there
is a switch in the generator that causes it to "diff" the output to a
sample and stop when the output differs. That pinpoints the location
where something different is being written out and allows me to fix
it.


Worth mentioning, as it confirms Eliot's point, is that the generated
"tables" from Yacc++ are in fact actually an "assembly language"
program for a fictional lexing and parsing machine and they are
written out in a human readable (albeit cryptic) format using opcodes
rather than numbers. One of the advantages of that is that as the
architecture has evolved, we could invent new opcodes (and rearrange
and renumber them) without impacting the generator. That gave us two
ways of improving the code, improving the generator and improving the
library.


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.