Re: slightly off topic -- writing an assembler!

khays@sequent.com (Kirk Hays)
24 Jun 1998 23:49:43 -0400

          From comp.compilers

Related articles
slightly off topic -- writing an assembler! SAMIGWE@worldnet.att.net (samuel) (1998-06-24)
Re: slightly off topic -- writing an assembler! khays@sequent.com (1998-06-24)
Re: slightly off topic -- writing an assembler! jacob@jacob.remcomp.fr (1998-06-25)
Re: slightly off topic -- writing an assembler! fs29@rummelplatz.uni-mannheim.de (1998-06-25)
Re: slightly off topic -- writing an assembler! nnylfv@ny.ubs.com (Olivier Lefevre) (1998-06-25)
Re: writing an assembler! lindsay-j@rmc.ca (John Lindsay) (1998-06-27)
Re: writing an assembler! henry@spsystems.net (1998-06-28)
Re: writing an assembler! ok@atlas.otago.ac.nz (Dr Richard A. O'Keefe) (1998-07-01)
[2 later articles]
| List of all articles for this month |
From: khays@sequent.com (Kirk Hays)
Newsgroups: comp.compilers
Date: 24 Jun 1998 23:49:43 -0400
Organization: Sequent Computer Systems, Inc.
References: 98-06-126
Keywords: assembler

samuel <SAMIGWE@worldnet.att.net> wrote:
> I am currently working on writing an assembler (intel syntax
>for the x86 microprocessor)for my operating system project. I haven't
>yet had any formal training on the design of one and havent been able
>to find any "assembler design" books.


[I'm ignoring your questions to suggest a methodology that subsumes
most of them. All information in this post is available from published
documents.]


[Assemblers are on-topic, IMHO. They're just a specialized compiler.]


A scheme that works particularly well for the x86, with it's mnemonic
overloading, is to implement a macro assembler, where each binary
instruction is described by its name and acceptable arguments, and
results in a procedure for emitting code.


IOW, you write the macro language interpreter, then write your
assembler as a set of macro procedures describing all the
instructions, their opcodes, and any assembly state that affects the
emitted binary code.


For example, the AAA instruction can be implemented as:


macro AAA () {


emit (0x37)
return "" // NB: all macros return a string, always!
}


where "emit" is a built-in to the macro language that writes out a hex
byte to the output, and advances the assembly instruction counter.


The assembler then consists of a small loop, parsing an instruction
and it's arguments, matching it against multiple opcode argument
templates (implemented as macros) until one is found with acceptable
arguments, execut the macro, rinse, repeat.


It can all be done in a single pass, with backpatching to take care of
forward references to labels (and a procedure is nothing more than a
label).


A more complex instruction, with three of the possible fourteen
instruction templates:


macro ADC ("AL", imm8) {


emit (0x14)
emit (imm8)
return ""
}


macro ADC (r16, r/m16) {


if (sixteen_bit_code_segment != TRUE) {
emit (OVERRIDE_BYTE)
}
emit (0x13)
reg_mem_16_emit (r16, r/m16)
return ""
}


macro ADC (r32, r/m32) {


if (sixteen_bit_code_segment == TRUE) {
emit (OVERRIDE_BYTE)
}
emit (0x13)
reg_mem_32_emit (r32, r/m32)
return ""
}


By organizing the templates properly, and quitting on the first match,
one can always pick the shortest or fastest binary instruction for any
mnemonic. Other costing metrics can be implemented, too.


You can also add pseudo-instructions, or combined instructions.


Minimize the number of built-ins - instead, add to the power of your
macro language, and use it to build procedures (such as
reg_mem_32_emit(), above). Plus, if it turns out that performance is
critical, you can selectively, intelligently, and *reversably* move
macro-implemented procedures into built-ins.


You'll need to be able to manipulate bits and bytes, and do string
matching and substringing. Recursion of macros is a *must*. NEVER
forget that the macro language is a string manipulation language.
Look at a SNOBOL (or ICON, perhaps) manual to discover your string
manipulation primitives. Keep the interpreter generic - it should
ideally know nothing about assembly, only about macro interpretation.
Insert your builtin procedures in a way that is equivalent (identical,
preferably) to user-defined macros. Warn about multiple definitions
of equivalent (macro, argument) pairs, but don't prevent them.


For symbol tables, AVL trees are excellent - Paul Vixie's public
domain AVL tree code, available as comp.sources.unix Volume 27, Issue
34 (`avl-subs') is a solid choice.


Implementation of the macro interpreter is left as an exercise for the
student ;-)


Believe me, writing a special language to implement this is *far*
easier than trying to code the whole thing, and makes it easier to add
instructions and verify correctness. You'll learn more by writing a
macro interpreter than you will by writing the assembler directly.


As a bonus, if you expose the macro language to your users, they also
have a powerful tool. For example, when I was writing the macros to
generate the kernel entry points for Intel's iRMX IV operating system
(aka "iRMX III" and "distributed iRMX") (segmentation, privilege
rings, validation, the whole banana for the 32 bit 80386
architecture), I had assembly macros that nested 18 levels deep to
generate optimal entry and parameter validation code. Far easier than
coding all that assembly language by hand, and simpler to change when
the (inevitable) changes were needed.


Have fun.


--
Kirk Hays
[I don't speak for Sequent.]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.