a compembler for x86 that looks nearly portable

rickh@capaccess.org (Rick Hohensee)
7 Jan 2002 01:13:59 -0500

          From comp.compilers

Related articles
a compembler for x86 that looks nearly portable rickh@capaccess.org (2002-01-07)
Re: a compembler for x86 that looks nearly portable RLWatkins@CompuServe.Com (R. L. Watkins) (2002-01-13)
Re: a compembler for x86 that looks nearly portable rickh@capaccess.org (2002-01-17)
Re: a compembler for x86 that looks nearly portable RLWatkins@CompuServe.Com (R. L. Watkins) (2002-01-24)
Re: a compembler for x86 that looks nearly portable rickh@capaccess.org (2002-01-28)
Re: a compembler for x86 that looks nearly portable rickh@capaccess.org (2002-01-28)
Re: a compembler for x86 that looks nearly portable david.thompson1@worldnet.att.net (David Thompson) (2002-02-06)
| List of all articles for this month |

From: rickh@capaccess.org (Rick Hohensee)
Newsgroups: comp.compilers
Date: 7 Jan 2002 01:13:59 -0500
Organization: http://groups.google.com/
Keywords: 8086, assembler
Posted-Date: 07 Jan 2002 01:13:59 EST

osimplay, formerly shasm, is an x86 macro-assembler,
"mid-level-language", or "compembler". It is implemented entirely in
GNU Bash 2 without dependance on any external utils. Coverage is
roughly 386, real and pmode, no FPU, with Linux syscalls. osimplay has
simple analogues of a nice set of C and Forth features, and some
unique features such as the "xray" jump-table construct, without
creating any syntactic seam between high-level and low-level. There is
no asm("") or CODE/ENDCODE. osimplay can now build working examples of
small Linux ELF executables, and a bootsector, and the sources are
included. osimplay is thus at beta development level. It's reasonably
useable, and the bugs that arise may now be small enough to not always
require the author to fix, although I would love to know about
them. This version of osimplay is public domain. Programmers and
would-be programmers that enjoy having thier assumptions challenged
should find osimplay amusing.


ftp://ftp.gwdg.de/pub/cLIeNUX/interim/osimplay.tgz
rickh@capaccess.org Rick Hohensee, sole author


long blurb......................................


asmacs begat shasm begat osimpa begat osimplay, and I'm saying




                                      osimplay is now beta.




asmacs
  was just a bunch of m4 macros for Gas that simply transliterated
Intel opcode and register names to names I consider massively clearer
and/or more convenient. Intel MOVx is = in osimplay, and LMSW is
loadmachinestatusword. = is about 25% of most code, and I believe
there's one occurance of LMSW in Linux, and I think that's there out
of nostalgia. Main register names in osimplay are A, B, C, D, SP, BP,
SI and DI. I found asmacs very helpful, and this simple renaming
remains the big win in osimplay. High-level languages have frozen the
evolution of assemblers, and some catch-up is about 35 years overdue.


shasm
  got rid of most of the need for sized register names like A - AX - AL
with "byte" and "cell" keywords. The cell concept also hides some
fundamental machine information elegantly, and thus is seen previous
to shasm (by 1970 or so) in Forth and BCPL, and is very helpful with
the fact that a 386 is two different size machines, 16 bit rmode and
32 bit pmode. The concept may be "forward-compatible" to IA64 also,
but I don't know that architecture. shasm also allows source/dest or
dest/source (AT&T or Intel) syntaciis by expanding the usual ","
arguments-delimiter to "to", "from" or "with". shasm got Slashdotted
before it could really produce much working 386 code, but it did
produce some shortly thereafter. shasm and it's existing subsequent
versions are 100% GNU Bash 2 shell scripts. That's right, just a
recent sh. No dd, sed, etc. "Installing", running, and reading some
operator-specific osimplay help on Linux/Bash is...


                tar xzvf osimplay.tgz
                cd osimplay_
                . osimplay
                = h


osimpa
  was shasm+enthusiasm. osimpa added various rustic imitations of C and
Forth constructs to shasm, and a couple features I suspect are unique,
without losing seamless access to assembly. A seam is typified by the
asm("") seam between C and Gas in the GNU toolchain. osimpa features
include; "allot", data "clump"s, "print", "text", "Linux" (syscalls),
"entrance" procedures, "heap" (like .bss), "ELF" (executables only) and so
on. In the course of adding all that featurism, shasm real mode support
was broken, but writing small Linux utilities became almost convenient.


Deliberately avoided to remain an assembler; data types, structured flow
control abstractions like DO/WHILE/FOR/ELSE, and of course there are no
Obstacle-Oriented Programming techniqueMethodMechanism()s. Although I
don't do IF/ELSE/ENDIF and so on, osimpa "when" conditional branches are
pretty nice for what they are, and osimpa has real execution arrays (jump
tables, not heavily tested).


osimplay
  means writing operating systems is simply childsplay. That is hype, and
is thus deliberately outrageous, but there's a sliver of truth to it. It
should make playing with OS design easier. osimplay can build anything
from a Linux console text editor (a fair wad of the beginnings of one are
included) to a mode-changing bootsector (also included, working.). In
other words, real mode is fixed, pmode is almost convenient, and thus
osimplay probably does merit the term "beta".


Result.
  Even high-level languages as low-level as C or Forth work from some
abstraction back to the machine. osimplay is pure bottom-up, being an
attempt at a Forth for one-stack machines. There are two areas where I
believe this has been worth the effort.


Systems programming suffers at the machine/abstraction seam, and there is
no such seam in osimplay. That seam is normally considered the cost of
portability, but I believe that cost can be greatly reduced in an
assembler-like model closer to the machine than C, and besides, there's
plentys of 386s out there.


I also suspect that osimplay is relatively easy to learn, particularly to
self-teach. No pointers (C), no stack-dancing (Forth), no REPxx (x86),
fairly interactive ...


An area where it hasn't been worth the effort is in runtime performance. C
is impressive, even on x86, which isn't a PDP-11. Even if I can beat Gcc,
it's not usually by much, but certain areas (switch/case, recursion, very
finely factored code...) still bear a closer look. Conversely, it's not so
hard to get close to C in assembly in most cases either. Optimized Gcc is
good, but unoptimized Gcc can be pretty, uh, amusing.


Beyond,
  osimplay visually looks pretty CPU-independant, and I believe, could be
completely portable (across commodity desktop CPUs) with a few more
tricks. The great genius of C is good portability with excellent
performance. Everything else about C is minor, including some mistakes.
The same is achievable much more simply, even via a shell script. One
lesson of Forth is that simplicity is robust.


I can't find the quote on Google, but I believe Rob Pike once told me in
9fans that UNIX naming tradition is horrid. Whether Mr. Plan 9 said so or
not, it is. Linux people are repulsed and enraged by my fits of
neologistic frenzy. Forth people obsess over names. There is excellent
reason for the latter. Bad names don't matter to machines, but frequently
cause humans to write dysfunctional, often totally self-extraneous code,
and this effect is self-compounding, and I believe people don't appreciate
how bad the situation is. To put it positively, I believe renaming is
currently a huge opportunity in computing, starting with assembly, which
is the point at which names start to matter. So go get osimplay before I
decide the name is wrong again :o) It's a script, so feel free to decide
the names are all wrong :o)


beyond beyond,
  C claims portability by only modeling the execution engine of the CPU in
the core of the language. Forth also. It would be nice if more operating
system mechanism was part of a standard portable language. I personally
don't know of such a language with systems-grade performance, and if it
exists I doubt it's very general. A compembler can help investigate that,
even one written in a unix sh. osimplay is now a distinct language
independant of implementation. Not too distinct though; most of it
shouldn't be too alien to good programmers, other than the basic fact that
in the current implementation your assembler source is a shell script.


ftp://ftp.gwdg.de/pub/cLIeNUX/interim/osimplay.tgz


and browse the cLIeNUX dirs above that :o)


That version of osimplay is public domain.


Rick Hohensee
rickh@capaccess.org
http://linux01.gwdg.de/~rhohen


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.