Re: Best tools for writing an assembler?
Wed, 26 Mar 2014 17:57:11 -0700 (PDT)

          From comp.compilers

Related articles
[20 earlier articles]
Re: Best tools for writing an assembler? (Hans-Peter Diettrich) (2014-02-25)
Re: Best tools for writing an assembler? (Hans-Peter Diettrich) (2014-02-25)
Re: Best tools for writing an assembler? (2014-02-25)
Re: Best tools for writing an assembler? (Walter Banks) (2014-02-27)
Re: Best tools for writing an assembler? (noitalmost) (2014-02-27)
Re: Best tools for writing an assembler? (George Neuner) (2014-03-01)
Re: Best tools for writing an assembler? (2014-03-26)
Re: Best tools for writing an assembler? (2014-04-13)
| List of all articles for this month |

Newsgroups: comp.compilers
Date: Wed, 26 Mar 2014 17:57:11 -0700 (PDT)
Organization: Compilers Central
References: 14-02-018 14-02-030 14-02-043 14-03-002
Keywords: assembler
Posted-Date: 26 Mar 2014 21:01:00 EDT

On Saturday, March 1, 2014 12:09:39 PM UTC-6, George Neuner wrote:
> However, parsers are notoriously difficult to get correct: it's not
> enough that your parser recognizes the language, it also has to reject
> everything else. That can be quite hard to verify.

That's not been my experience. But then, I use a formalism and
calculus for "context free expressions", which makes the whole affair
rather transparent and easy. I wouldn't worry too much about the
syntax end of the project (particularly after properly attending to
the "normalization" process alluded to below). The real issues are
going to be the rest of the assembler!

If one is going to do an assembler -- as I'm getting fixed to do --
there are a few words that should be said on the matter.

First in terms of the syntax, the best advice is to *normalize* the
syntax. Lately, I've seen a trend in the design of languages that can
only be described as a retreat to the 1960's back when people thought
you could (or should) get a grammar to engulf the entire set of
semantic constraints, not just the enveloping syntax. The mindset runs
so deep in those rooted in that era, that it may even be done
unconsciously, rather than by intent.

The specific example I have in mind is C++ at the high level (and even
C with its false separation of syntaxes for "direct" and "abstract"
declarators). At the assembly level, the most notorious example is
MASM and the Intel syntax it is associated with.

There are a half-dozen instances of the phrase category "Directive" in
the phrase structure grammar whose only distinctions are semantic in
nature: i.e. there is an attempt to implement semantic constraints by
cleaving this single category into a half-dozen shards.

A good grammar should serve as nothing more than an enveloping
framework. Leave the constraints out of the grammar. That's what the
{...} in Yacc are for (for instance) if you're using Yacc. If, on the
other hand, you're STARTING with MASM like syntax, then a good
exercise is to first try to NORMALIZE the syntax, by (a) recombining
the various instances of duplicated categories and (b) factoring out
the constraints.

In many cases, one may find that the "cosntraints" are not actually
needed at all, and one can then generalize to create a more powerful
and robust (and simpler) language.

As it turns out, I actually carried out the process of
normalization(!). It would be lengthy to post results here, though the
resulting grammar is not particularly large. Both the syntax for
directives and expressions have been normalized. Part of this,
interestingly, was already attempted by Microsoft in their old MASM
reference, where they indexed the "expression" category by precedence
level, rather than concocting separate names for all the instances of
expression type. But they still left out the boolean expressions from
the rest of the syntax.

On the design end, the best thing to do is first do a survey of all
the assemblers in common use out there, particularly ones that claim a
degree of platform-universality. GAS works with a large number of
platforms, for instance, and implements the AT&T syntax. Both MASM and
NASM work with the large family of x86 processors, though there is a
fairly large portion of the language in the syntax for directives
(especially in the case of NASM) that can be made largely

In the process of surveying, there should be careful attention paid to
the "history" notes to see how the items evolved and what roadblocks
were encountered. You'll be encountering the same roadblocks. GAS, by
the way, has several huge history lists in its distribution.

As far as CPU-independence goes: even if you're targeting for a single
CPU, you still need to have some level of robustness for
CPU-independence. The reason is simple: as of late, CPU's have been
undergoing an increasingly rapid series of incremental changes (even
the lowly 8-bit 8052 which now has a 32 bit version!) If you're stuck
in the present (a mindset you can always identify by those people who
use "modern" to describe contemporary notions or ideas), you're going
to quickly be stuck in the past with that CPU.

So, there should always be some attention to keeping the particulars
of the CPU as much out of the assembler as possible.

The same goes for the back end. Object formats have also been
undergoing changes as of late. There should be reconfigurability for
both this and even the processor model and input syntax (e.g. by ways
of dot-keyword directives like ".att" for AT&T syntax or ".P6" for
Pentium Pro, "elfx32" for ELFx32 object format etc.)

That's where all your worry's going to be. Not the syntax!

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.