Desciption of a new BC implementation

markh@csd4.csd.uwm.edu (Mark)
Tue, 6 Jul 1993 04:20:43 GMT

          From comp.compilers

Related articles
Desciption of a new BC implementation markh@csd4.csd.uwm.edu (1993-07-06)
| List of all articles for this month |

Newsgroups: comp.compilers
From: markh@csd4.csd.uwm.edu (Mark)
Keywords: arithmetic, tools
Organization: Computing Services Division, University of Wisconsin - Milwaukee
Date: Tue, 6 Jul 1993 04:20:43 GMT

      This is a preliminary description of a new implementation of the BC
language currently undergoing testing and soon to be released into the
public domain.


      The standard BC language is a stack-based typeless calculator language
that works only with arbitrary precision numbers and one-dimensional
arrays of numbers and a very limited subset of C's syntax. The revised
language, which I call GC (which does NOT stand for GNU BC), is a superset
of BC that includes a type system for (among other things) arbitrary
precision numbers and complex numbers. With the type-building operations
added, you will also be able to build up types involving pointers and
arrays of arbitrary dimension, with virtually the same semantics as in C,
but yet still compatible with BC (in particular, BC's idiosyncratic
handling of name spaces).


      But this is sort of an aside. What I really want to describe is how
control flow is handled. The GC language is significantly larger than BC
in that it includes virtually all the C control statements, including the
goto. The method is largely unique. Basically I stumbled onto it very
early on while trying to implement another method.


      The basis of the interpreter is the "continuation string", denoted like
so:


                                                                  (x E y z)


The letter x is the string's label (or address). The letters y and z are
labels denoting other continuation strings, and E is a pointer to the
byte-code sequence to be executed.


      The way the interpreter works is as follows:


                        (1) The sequence E is executed, usually this is an
                                expression to be evaluated. This sequence can involve
                                function calls, which will recursively start up another level
                                of execution. But there's no jumps or returns or any other
                                kind of break in execution in the string E itself, besides
                                this.
                        (2) After the end of the byte-sequence E is reached, a
                                "continuation" is made based on the labels y and z as follows:


                                      if y == z, execution continues with the string labeled y
                                      if y != z, the topmost item is popped off the stack
                                            if it is 0, execution continues with string z
                                            if it is not 0, execution continues with string y
with the following "return" conditions:


                        (3) If y or z is the null pointer, the current execution
sequence is
                                ended, and control resumes one level back (this is basically
                                the "return" statement). At the top level, the interpreter
                                goes idle and waits for more commands or declarations.
                        (4) If E is the null pointer, the interpreter halts and
                                returns to the system (this is the "halt" statement).


      Using this method, you can translate arbitrary C-like statements,
including even the goto. For instance, you can form the following
translation rules:


                      while (E) S <---> ... c c) (c E x b) (x S c c) (b ...


Inside this while-loop, breaks and continues are translated as follows:


                      break; <---> ... b b) (y ... where y is a new label
                      continue; <---> ... c c) (z ... where z is a new label


      Labeled statements are translated as follows:


                      L: S <----> ... L L) (L S ...


and goto's referring to this label, as follows:


                      goto L; <----> .. L L) (x S ... where x is a new label


      A function definition is translated like in the following example:


            define f(...) { <----> (1
                  E E
                  while (A) S 2 2) (2 A 3 4) (3 S 2 2) (4
                  if (B) T B 5 6) (5 T 6 6) (6
                  return C C 0 0)
            }


which, when taken together forms the following set:


                                            (1 E 2 2)
                                            (2 A 3 4)
                                            (3 S 2 2)
                                            (4 B 5 6)
                                            (5 T 6 6)
                                            (6 C 0 0)


When the function is called, execution starts at continuation 1. One
thing to note about this, is that all the continuations are completely
decoupled from one another. They can be reordered arbitrarily without
affecting the underlying control flow at all (except that continuation 1
must remain 1).


      The significance of all this is that now it's possible to set up a
purely expression based interpreter, involving no control flow statements
at all -- because control flow is all handled by the continuations. So
the actual interpreter language has relatively few operators (basically
just the numeric, stack, pointer, and array operators). This separates
out control from evaluation.


      Other parts of this implementation: the parser (a bottom-up
hand-written parser), the byte-code language, the arbitrary precision
numeric algorithms, and the formal semantics, all use equally novel
methods and may be described in future notes here and in the future
release.
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.