LL(1) and common-heads...

Gabor DEAK JAHN <deakjahn@ludens.elte.hu>
30 Dec 1995 01:00:10 -0500

          From comp.compilers

Related articles
LL(1) and common-heads... p8uu@jupiter.sun.csd.unb.ca (1995-12-28)
LL(1) and common-heads... deakjahn@ludens.elte.hu (Gabor DEAK JAHN) (1995-12-30)
Re: LL(1) and common-heads... mparks@oz.net (1996-01-12)
Re: LL(1) and common-heads... d.sand@ix.netcom.com (1996-01-14)
Re: LL(1) and common-heads... Osman.Buyukisik@ae.ge.com (U-E59264-Osman Buyukisik) (1996-01-15)
| List of all articles for this month |

From: Gabor DEAK JAHN <deakjahn@ludens.elte.hu>
Newsgroups: comp.compilers
Date: 30 Dec 1995 01:00:10 -0500
Organization: Compilers Central
References: 95-12-143
Keywords: assembler, parse

Saulnier <p8uu@jupiter.sun.csd.unb.ca> wrote:

> Hi all, I've written an actual assembler for this new OS that
> I was wondering if it would be of any use writing it using LL(1).

When I wrote an assembler for Intel 80x86 processors, I also faced a
similar problem with operand parsing. I started to experiment with
LL(1), but I soon found that the syntax of 80x86 assembly operands was
way too liberal. The different elements may come in virtually any
order, and writing an LL(1) grammar to cover all possible combinations
would have been tedious and would have needed a lot of left-factoring.

Finally, I wrote a special state automaton, whose various states
represented the possible addressing modes. The input symbols driving
the table were not the tokens themselves (that is, not "EAX" or "GS")
but their general categories (eg. "32-bit-general-register" or
"segment-register"). A filter (practically an LL(1)) turned the tokens
into such categories. At the same time, it noted the details of the
tokens in additional variables for later use.

After parsing the operand, the last state of the automaton repesented
the addressing mode, and the additional variables (bit fields and
integer values) contained every information needed to generate the
necessary opcodes.

Due to the specialties of the 80x86 assembly operand syntax, this
combined solution was much easier to write, much smaller and much
faster in real life.

However, there is a major drawback: in the case of an error, it is not
always easy to pinpoint the offending token. Most assemblers only
report the line number in their error messages, so this may not be a
big problem (it was not for me, anyway).

    Gabor DEAK JAHN

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.