Re: Writing Assembler! (Mark Hopkins)
9 Jun 1997 00:55:49 -0400

          From comp.compilers

Related articles
[6 earlier articles]
Re: Writing Assembler! (Mr J R Hall) (1997-05-22)
Re: Writing Assembler! (JUKKA) (1997-05-22)
Re: Writing Assembler! (Charles Fiterman) (1997-05-22)
Re: Writing Assembler! (1997-05-25)
Re: Writing Assembler! (1997-05-25)
Re: Writing Assembler! (JUKKA) (1997-06-09)
Re: Writing Assembler! (1997-06-09)
Re: Writing Assembler! (1997-06-09)
Re: Writing Assembler! (1997-06-11)
Re: Writing Assembler! (Cliff Click) (1997-06-11)
Re: Writing Assembler! (1997-06-13)
Re: Writing Assembler! (1997-06-13)
Re: Writing Assembler! (1997-06-13)
[4 later articles]
| List of all articles for this month |

From: (Mark Hopkins)
Newsgroups: comp.compilers
Date: 9 Jun 1997 00:55:49 -0400
Organization: Omnifest
References: 97-05-156 97-05-234
Keywords: assembler, parse

>From (Mr J R Hall):
>>Is it actually necessary or useful to use bison to write an assembler?
>>I can't think of where you need to use recursion, unless it's for some
>>kind of preprocessing.
>No, the general grammar of assembly language statements isn't complex
>enough to require any but the most simple of techniques.
>...the grammar is very simple. Something like
>instruction -> optlabel mnemonic operands
>operands -> <empty> | operand | operand,operand
>operand -> register | constant | indirect
>register -> AX|BX|etc
>constant -> number|character|label
>indirect -> [ expression ]
>The expressions for the indirect can be simplified forms of a general
>expression involving base registers, index registers, and offsets,
>with optional constant multiplier for the index register...

I think allowing anything but the most general expression in the last
item and having the other items hanging out in the open like that
complicates life for both the parser and the user. At the very least,
all expression forms should be recognized if for nothing more than to
generate error messages. Also, separating out the syntax of registers
from that of expressions complicates parsing, and misses out on some
major opportunities.

As a general rule of thumb: the less orthogonal the syntax, the more
effort will be required to learn it and use it correctly.

And as long as the assembler is going to recognize the general syntax,
why not generate binary for it too, even when it's not "supported" by
the language? Anything you can write binary for is 'supported' even
if it's not part of the 'official' language, so the distinction is
actually fuzzy. We should take advantage of that fact to simplify our
design. Then there will be fewer error messages. As I explained in
the article "What Is Assembly Language?", even when intervening
registers are required to implement the operation in binary, one has
various strategies to deal with how to handle the behind-the-scenes
use of registers (in the worst case, falling back on the original
strategy of generating error messages).

My expression syntax would look like this:

E = Expression:
E -> p E | E o E | E q | E "?" E ":" E | "(" E ")" | x
p -> prefix operators (!, ~, +, -, etc.)
          "@" (or "*", for address indirection)
          high, low,
o -> infix operator (+, -, *, /, %, <<, >>, &, ^, |, &&, ||
          ==, !=, <, >, <=, >=, :, .), the last two being used for concatenation
          and bit-fields.
q -> postfix operators.
x -> Variable, "$" (or "*"), Label, Number, Character, String, Type
ANSI-C syntax used for characters, strings and numbers.
Intel (or Motorola) syntax also allowed for numebers. It's a little tricky
to combine ANSI-C numbers with Intel or Motorola numbers, but doable.

Types would be defined for different address spaces and possibly
expression types as well. At the very least, one should have "data",
"code" and a even a type called "type", defined such that:

                                              typeof (typeof E) == type.

thus, making type the fixed-point of typeof with: typeof type == type.

As much as possible, the syntax for registers should be included as
part of the general syntax for expressions! For example, with the
80x86, we could do this by defining address types "Rb", "Rw", "Rd" and
"Rs" for the different register types, with the following address

Address Rb space Rw space Rd space Rs space
      0 AL AX EAX ES
      1 CL CX ECX CS
      2 DL DX EDX SS
      3 BL BX EBX DS
      4 AH SP ESP FS
      5 CH BP EBP GS
      6 DH SI ESI --
      7 BH DI EDI --

Now, instead of being reserved words for denoting registers, we will
reinvent the register syntax so that these names will now just be
constants pre-defined by the assembler to denote their respective

A general label declaration will look like this:

                                    [global] Label :
                                    [global] Label equ E
or [global] Label T E

So an equivalent label definition for the register name AL would be
like this:

                                                          AL Rb 0

This will also simplify the design of the assembler's symbol table and
symbol processing.

It's important to have this syntax fully incorporated in the expression
syntax, because further down we may want to declare names for expressions
involving registers, for instance:

                                                    Counter equ CX
                                                    FP equ SS:BP

or even more elaborate expressions. We may even get the assembler to
process expressions as if the following statements were true, in the
case of the 80x86:

                                          high AX == AH, low AX == AL
                                          AH:AL == AX

when doing expression evaluations. That's only the beginning.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.