Related articles |
---|
LL(1) and common-heads... p8uu@jupiter.sun.csd.unb.ca (1995-12-28) |
LL(1) and common-heads... deakjahn@ludens.elte.hu (Gabor DEAK JAHN) (1995-12-30) |
Re: LL(1) and common-heads... mparks@oz.net (1996-01-12) |
Re: LL(1) and common-heads... d.sand@ix.netcom.com (1996-01-14) |
Re: LL(1) and common-heads... Osman.Buyukisik@ae.ge.com (U-E59264-Osman Buyukisik) (1996-01-15) |
From: | Gabor DEAK JAHN <deakjahn@ludens.elte.hu> |
Newsgroups: | comp.compilers |
Date: | 30 Dec 1995 01:00:10 -0500 |
Organization: | Compilers Central |
References: | 95-12-143 |
Keywords: | assembler, parse |
Saulnier <p8uu@jupiter.sun.csd.unb.ca> wrote:
> Hi all, I've written an actual assembler for this new OS that
> I was wondering if it would be of any use writing it using LL(1).
When I wrote an assembler for Intel 80x86 processors, I also faced a
similar problem with operand parsing. I started to experiment with
LL(1), but I soon found that the syntax of 80x86 assembly operands was
way too liberal. The different elements may come in virtually any
order, and writing an LL(1) grammar to cover all possible combinations
would have been tedious and would have needed a lot of left-factoring.
Finally, I wrote a special state automaton, whose various states
represented the possible addressing modes. The input symbols driving
the table were not the tokens themselves (that is, not "EAX" or "GS")
but their general categories (eg. "32-bit-general-register" or
"segment-register"). A filter (practically an LL(1)) turned the tokens
into such categories. At the same time, it noted the details of the
tokens in additional variables for later use.
After parsing the operand, the last state of the automaton repesented
the addressing mode, and the additional variables (bit fields and
integer values) contained every information needed to generate the
necessary opcodes.
Due to the specialties of the 80x86 assembly operand syntax, this
combined solution was much easier to write, much smaller and much
faster in real life.
However, there is a major drawback: in the case of an error, it is not
always easy to pinpoint the offending token. Most assemblers only
report the line number in their error messages, so this may not be a
big problem (it was not for me, anyway).
Bye,
Gabor DEAK JAHN
<deakjahn@ludens.inf.elte.hu>
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.