Re: Writing Assembler!

Mr J R Hall <>
22 May 1997 21:50:36 -0400

          From comp.compilers

Related articles
Writing Assembler! (Khoo Kiak Wei) (1997-05-13)
Re: Writing Assembler! (1997-05-14)
Re: Writing Assembler! (Clark L. Coleman) (1997-05-16)
Re: Writing Assembler! (1997-05-17)
Re: Writing Assembler! (Ben Bullock) (1997-05-17)
Re: Writing Assembler! (1997-05-17)
Re: Writing Assembler! (Mr J R Hall) (1997-05-22)
Re: Writing Assembler! (JUKKA) (1997-05-22)
Re: Writing Assembler! (Charles Fiterman) (1997-05-22)
Re: Writing Assembler! (1997-05-25)
Re: Writing Assembler! (1997-05-25)
Re: Writing Assembler! (JUKKA) (1997-06-09)
Re: Writing Assembler! (1997-06-09)
[10 later articles]
| List of all articles for this month |

From: Mr J R Hall <>
Newsgroups: comp.compilers
Date: 22 May 1997 21:50:36 -0400
Organization: Compilers Central
References: 97-05-156
Keywords: assembler, parse

Khoo Kiak Wei wrote:
>> I am planning to write a generic assembler, as a work of learning
>> flex and bison. However, after reading a book on Lex and Yacc, I am
>> still confused on how should I start!?

>Is it actually necessary or useful to use bison to write an assembler?
>I can't think of where you need to use recursion, unless it's for some
>kind of preprocessing.

No, the general grammar of assembly language statements isn't complex
enough to require any but the most simple of techniques.

For example, NASM (see the reference in my sig for details of how to
download the source) is an 80x86 assembler. The parser is pretty much
ad-hoc, I suppose its probably closest to recursive descent in terms of its
structure, but the grammar is very simple.

Something like

instruction -> optlabel mnemonic operands
operands -> <empty> | operand | operand,operand
operand -> register | constant | indirect
register -> AX|BX|etc
constant -> number|character|label
indirect -> [ expression ]

The expressions for the indirect can be simplified forms of a general
expression involving base registers, index registers, and offsets,
with optional constant multiplier for the index register. Most
assemblers will only allow very limited forms of the expression,
whereas NASM allows this form:

the base register and index registers may be specified in any order,
and with any valid multiplier (ie BX*5 is valid - other assemblers
require BX + BX*4).

>Also, if you try and make a generic assembler, you are actually
>dealing with several different languages.

Several different languages, which could easily be constrained to a
single syntax if you design it well; remember that you don't
necessarily have to use the standard syntax in normall use for your
target processor (eg, take a look at GAS for the 386 - it looks
nothing like Intel's syntax).

>This doesn't sound like a good place to use flex to me, because you
>don't know what patterns you want to match at compile-time.

You want to match sequences of letters and numbers into tokens, etc.
I think it can be done using flex. On the other hand, NASM has a hand
built lexical analyser; it doesn't really require anything like the
flexibility that flex provides.

>> If any of you know of any sample code on assembler or good books talking
>> on the assembler construction, I would like to hear from you please!
>There are several freely available assemblers. One is the gnu generic
>assembler gas. Another one is nasm, for intel x86 chips. Neither of
>these use things like flex or bison though, except gas uses bison
>somewhere for one particular chip (even then it says in the comments
>that that is overkill).

If you are interested in NASM, the following URL contains details of
how to download it:

Julian R Hall preferred --->>

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.