Re: Threaded Interpretive Languages

"Julian V. Noble" <jvn@fermi.clas.virginia.edu>
Tue, 21 Sep 1993 16:58:59 GMT

          From comp.compilers

Related articles
Threaded Interpretive Languages a_tucker@paul.spu.edu (Andrew Tucker) (1993-09-14)
Re: Threaded Interpretive Languages jvn@fermi.clas.virginia.edu (Julian V. Noble) (1993-09-21)
Re: Threaded Interpretive Languages cliffc@rice.edu (1993-09-23)
Re: Threaded Interpretive Languages N.Chapman@cs.ucl.ac.uk (Nigel Chapman) (1993-09-24)
Re: Threaded Interpretive Languages dsiegel@panix.com (1993-09-26)
Re: Threaded Interpretive Languages pop@dcs.gla.ac.uk (pop) (1993-09-28)
Re: Threaded Interpretive Languages cliffc@rice.edu (1993-09-28)
Re: Threaded Interpretive Languages mikc@gnu.ai.mit.edu (1993-09-29)
| List of all articles for this month |
Newsgroups: comp.compilers,comp.lang.forth
From: "Julian V. Noble" <jvn@fermi.clas.virginia.edu>
Keywords: code
Organization: University of Virginia
References: 93-09-059
Date: Tue, 21 Sep 1993 16:58:59 GMT



Andrew Tucker writes:
> In a book I bought recently, the author implements a Forth-like script
> language in what he calls a "threaded interpretive language". What are the
> benefits/drawbacks of TILs compared to more traditional language
> implementations are? Also, are TILs usually used for RPN, stack based
> languages like Forth? What other languages are TIL based?


The book was probably "Write Your Own Programming Language Using C++"
by Norman Smith (Wordware, 1991).


> [Threaded code is most often used for a stack virtual machine, but the first
> two threaded code systems I ran into were Fortran compilers for the PDP-11.
> -John]


The prototypical TIL is FORTH. There is a newsgroup comp.lang.forth on
which you can read the often convoluted expressions of dedicated FORTH
programmers. A nice article on the history of FORTH appeared in BYTE
Magazine September 1980 (written by its creator, Dr. Charles Moore).


In many ways FORTH is like Lisp (or perhaps Lisp is like FORTH? :-), ex-
cept it uses postfix ("reverse Polish") rather than prefix notation.


In addition, the programmer has direct access to the main stack (parameter
stack) of the cpu. Stack access + RPN greatly reduces the overhead of
subroutine calls, hence encourage the fine-grained decomposition of
programs into small functional units that do only one thing. For this
reason FORTH programs tend to be vastly shorter and to use much less
memory than equivalent programs in other languages. Typical automated
measures of "program complexity" applied to FORTH code yield very low
indices even for programs I would regard as complex.


FORTH is incrementally compiled. The compilation mechanism is part of the
language and its components accessible to the programmer. This latter
feature distinguishes FORTH from all other languages I know. (Maybe Lisp
does it too? Don't know enough about it to say.) As a result, the user can
try something out as soon as it is defined. For example (a trivial one, to
be sure!) suppose we define a new FORTH subroutine (everything in FORTH is
a subroutine, called a "word", just as everything in C is a function):


                : *+ * + ; ( a b c -- b*c+a)


The colon ***is*** the compiler. It is a FORTH word that makes a new
dictionary entry with the name *+ . (There is no artificial restriction on
which ASCII characters can be used in names, so we can choose self-
documenting ones.) The word : then switches the state to COMPILE from
INTERPRET, so that subroutines following in the input stream -- in this
case, * and + -- are looked up in the dictionary and their "execution
tokens" written down in the "parameter field" of the new definition *+ .
Finally, the word ; that terminates the definition is looked up, but when
it was defined a bit was set in its name marking it IMMEDIATE so rather
than being written down, it is executed (even though the system is
compiling). Its action is to install some terminating code for doing the
appropriate stack bookkeeping, and then to switch back to INTERPRET mode.


The parenthesized stuff -- a left paren by itself is a word that says
"disregard all text up to a right paren -- is a "stack comment" that is
often all the documentation a word needs, especially one with a
telegraphic name like *+ . It says the integers a b c are taken from the
stack and the result, b*c+a, left in their place.


When a word is executed (by typing in its name), its execution tokens are
simply fed sequentially to the execution mechanism. So in a sense *+ is a
named script that says "do everything following me".


Now having typed in the definition (sans stack comment),


                : *+ * + ; ok


let us test the result ("ok" says all went well and it compiled):


                3 4 5 *+ . 23 ok


(The dot or period is a word that says "emit the number on the top of the
stack, consuming it in the process". FORTH words usually consume their
arguments.) We see the word *+ did precisely what it was supposed to,
hence it is debugged and can be marked "tested" in the source code.


Another point -- when the interpreter encounters a number in the input
stream, it converts the number to internal form and pushes it on the
stack. That's why we could test the word *+ (which expects 3 integers on
the stack) as we did.


So one of the major advantages of a TIL like FORTH is the ease of
developing and debugging code. None of the cumbersome accessories of the
more mainstream languages -- programming editors, interactive debuggers,
MAKE -- are needed in FORTH. This means FORTH can be written and debugged
quickly on small machines.


Another major advantage of FORTH is access to the compilation mechanism.
The components of : (the standard compiler) can be used to create special-
ized mini compilers for specific tasks. For example, I have been using for
several years a simple mini compiler FSM: that turns a state-transition
table into a finite state machine. Recently I read an article (before
submission to a journal) on mini compilers that produce optimized machine
code for a target cpu, then download it for testing (this kind of thing
enormously speeds embedded systems development and real-time programming).


Since I use FORTH primarily for scientific number-crunching (in prefer-
ence to FORTRAN which I all but abandoned some 8 years ago) I have written
a FORmula TRANslator (i.e. another mini compiler) that converts formulas
into FORTH on the fly. It permits mixed real*4, real*8, complex*8 and
complex*16 expressions, has a complete function library, accepts
user-defined functions -- in short, does nearly everything a commercial
FORTRAN would do -- in about 400 lines of FORTH code that adds < 8 Kbytes
of compiled code to my system!


The above and related advantages are the chief reasons why FORTH users
would rather fight than switch.




What are the disadvantages? Systems folk dislike FORTH on multi user
systems, since it is virtually impossible to protect against a clever
FORTH programmer screwing up the system. Thus FORTH has tended --with
notable exceptions-- to remain on single-user environments.


High level FORTH develops much more quickly, but generally runs up to 10x
slower than equivalent C (albeit the code is also 10x shorter).


[But, as Winston Churchill said to a lady who complained he was drunk
"Yes, Madam, and you are ugly. But in the morning I shall be sober." That
is, one can speed up the FORTH very easily without notably increasing its
code image, by using the built-in assembler. All FORTHs worthy of the name
include an assembler that will allow definitions of words directly in
machine language. The resulting words look and feel to the user exactly
like words defined with : ... ; so there is no linkage step or separate
assembly. Such words can execute at the speed of the bare silicon. For
example, in my matrix package, I define the innermost loop (of three
nested loops) in assembler. The package therefore executes as fast as
possible, for large matrices (since the inner loop determines the
coefficient of N^3 in the run-time). Since the inner loop is simple, the
amount of assembler to program is minimal. And devel- opment was a snap
since I first did the inner loop in high level, tested the package, and
once everything was ok, proceeded to define the code version of the inner
loop and check that it worked as the hi-level version did. Well, I guess
this isn't a disadvantage after all... :-) ]


It is widely repeated by the ignorant that FORTH is a "write-only"
language. This is nonsense. It may have been true in the days when FORTH
used 1K screens (1K disk blocks) to hold code, but in truth one can write
unreadable, un-maintainable code in the language of one's choice, simply
by avoiding good practices. Modern FORTHs are at home with operating
systems, use files, and support good documenting practices. Most
disagreements in the FORTH community today are over small issues of style.


The major -- and rapidly disappearing -- disadvantage of FORTH relative
to, let us say, C, is the lack of standard libraries for common tasks
(FORTRAN, e.g. has LINPACK, CERNLIB, you can buy a standard set of printer
drivers for C, etc. etc.). Also, early FORTHs did not provide facilities
for linking with .obj code libraries --say compiled from other languages.
Good commercial FORTHs today will include such facilities as a matter of
course; and the absence of generic libraries of FORTH will be rectified
soon also, now that an ANSI standard exists and is on the verge of
acceptance.


So I personally think it is time for people who have been treating FORTH
like a maiden aunt who has encountered a bad smell, to reevaluate their
positions on the subject.




Anyone who would like to give FORTH a whirl can get a free diskette with 2
public domain FORTHs for the PC and some useful tutorial stuff simply by
asking for it via e-mail and including a regular (domestic US only)
mailing address.


--jvn
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.