Re: Best tools for writing an assembler?

federation2005@netzero.com
Sun, 13 Apr 2014 12:24:44 -0700 (PDT)

          From comp.compilers

Related articles
[21 earlier articles]
Re: Best tools for writing an assembler? DrDiettrich1@aol.com (Hans-Peter Diettrich) (2014-02-25)
Re: Best tools for writing an assembler? rpw3@rpw3.org (2014-02-25)
Re: Best tools for writing an assembler? walter@bytecraft.com (Walter Banks) (2014-02-27)
Re: Best tools for writing an assembler? noitalmost@cox.net (noitalmost) (2014-02-27)
Re: Best tools for writing an assembler? gneuner2@comcast.net (George Neuner) (2014-03-01)
Re: Best tools for writing an assembler? federation2005@netzero.com (2014-03-26)
Re: Best tools for writing an assembler? federation2005@netzero.com (2014-04-13)
| List of all articles for this month |
From: federation2005@netzero.com
Newsgroups: comp.compilers
Date: Sun, 13 Apr 2014 12:24:44 -0700 (PDT)
Organization: Compilers Central
References: 14-02-018 14-02-030 14-02-043 14-03-002 14-03-066
Injection-Date: Sun, 13 Apr 2014 19:24:44 +0000
Keywords: assembler
Posted-Date: 14 Apr 2014 00:26:46 EDT

On Wednesday, March 26, 2014 7:57:11 PM UTC-5, federat...@netzero.com wrote:
> First in terms of the syntax, the best advice is to *normalize* the
> syntax. At the assembly level, the most notorious example [of an
> unnormalized syntax] is MASM and the Intel syntax it is associated with.


> There are a half-dozen instances of the phrase category "Directive" in
> the phrase structure grammar whose only distinctions are semantic in
> nature: i.e. there is an attempt to implement semantic constraints by
> cleaving this single category into a half-dozen shards.


In fact, based on the reference for MASM 6.1, one could "reintegrate" the
directive syntax by first classifying all their contexts:


Contexts:
(a) RunTimeLoop: inside a .WHILE/.REPEAT statement group


(b) MacroBody: inside a MACRO/REPT/REPEAT/WHILE/FOR/FORC/IRP/IRPC statement
group


(c) Statement: inside a .IF/.ELSEIF/.ELSE statement group, a statement group
headed by any of the IF*/ELSEIF*/ELSE keywords or at the TopLevel


(d) SegmentBody: inside a statement group (1) begun by a ".CODE", ".DATA",
".DATA?", ".CONST", ".FARDATA", ".FARDATA?", ".STACK" or "SEGMENT" directive
and ended by its matching "ENDS" directive; or (2) begun by a "PROC" directive
and ended by its matching "ENDP" directive.


(e) Inside a statement group headed by STRUC/STRUCT/UNION


Then it suffices to lay out one (large) set of phrase structure rules for the
directives ("Dir"), with the constraints then noted separately. It's much
easier for a parser to check for these constraints than to try and force-fit
it into the grammar. In addition, organizing things this way may lead to
simplifications that occur by recognizing that some of the constraints are
simply not needed and can be removed.


Notice, by the way, in the list below that the assembler does
"quasi-compilation" (that is, it compiles run-time statements for loops and
functional calls). I don't know if GAS is doing that much. But a decent
assembler might have to match this level of functionality.


For MASM syntax, the classification would lead to the following groups.


(1) Loop directives appear only in context (a) and include phrase structure
rules for:
Dir -> ".BREAK" ...
Dir -> ".CONTINUE" ...


(2) Macro Directives appear only in context (b) and include directives for the
prefix ":", "GOTO" and "EXITM".


(3) Segmentation Directives appear only in contexts (a), (b) or (c) and
involve all the ".CODE", ..., "SEGMENT" "segment body" directives


(4) In-Segment Directives appear only in context (d) and include the
mnemonics, along with the x86 specific prefixes (i.e. "REP", "REPE", ...,
"LOCK").


It also includes the quasi-compiled control-flow (yes, MASM is doing basic
compilation!):


".IF"...".ELSEIF"...".ELSE"...".ENDIF",
".WHILE"...".ENDW"
".REPEAT"...".UNTIL" (or ".UNTILCXZ")
"PROC"..."ENDP"


as well as ".STARTUP", ".EXIT", "LABEL" "INVOKE" (the "invoke" statement is
the quasi-compiled function call).


(5) Member Directives appear only in context (e): the "STRUC", "STRUCT" and
"UNION" statements.


(6) Type and Alignment Directives appear in contexts (d) and (e). Those are
the:
initializers ("DB", "DW", ...)
type declarations ("BYTE", "SBYTE", "WORD", ...)
type declarations involving "typedef"'ed types
declarations that include structure/union field components
declarations involving "record" typenames
alignment/location directives ("EVEN", "ORG", "ALIGN")


(7) General Directives may appear in all contexts.
(Basically, every other directive in the MASM 6.1 syntax).


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.