Re: Using FORTH as target machine?

"BGB / cr88192" <cr88192@hotmail.com>
Sat, 25 Jul 2009 13:23:56 -0700

          From comp.compilers

Related articles
Using FORTH as target machine? mailings@jmksf.com (mailings@jmksf.com) (2009-07-22)
Re: Using FORTH as target machine? DrDiettrich1@aol.com (Hans-Peter Diettrich) (2009-07-25)
Re: Using FORTH as target machine? cr88192@hotmail.com (BGB / cr88192) (2009-07-25)
Re: Using FORTH as target machine? blog@rivadpm.com (alextangent) (2009-07-26)
Re: Using FORTH as target machine? blog@rivadpm.com (alextangent) (2009-07-26)
Re: Using FORTH as target machine? joevans@gmail.com (Jason Evans) (2009-07-27)
Re: Using FORTH as target machine? pjk@bcs.org.uk (Peter Knaggs) (2009-07-28)
Re: Using FORTH as target machine? akk@nospam.org (Andreas) (2009-07-28)
Re: Using FORTH as target machine? kym@svalbard.freeshell.org (russell kym horsell) (2009-07-29)
[1 later articles]
| List of all articles for this month |

From: "BGB / cr88192" <cr88192@hotmail.com>
Newsgroups: comp.compilers
Date: Sat, 25 Jul 2009 13:23:56 -0700
Organization: albasani.net
References: 09-07-080
Keywords: forth
Posted-Date: 26 Jul 2009 17:43:32 EDT

<mailings@jmksf.com> wrote in message news:09-07-080@comp.compilers...
> I've got some questions about things relating to the topics of compiler
> backends and target languages dealing with the Forth programming language.
>
> While crawling the web, I was unable to find out a compiler that uses
> Forth as its target language. But in my opinion, compiling a
> higher-level language into Forth code is a great deal between using a
> standardized lower-level and widely spread programming language and its
> platform-independency. There are even CPUs which are capable of
> executing Forth.
>
> So, does anyone know about a compiler-project or similar software that
> uses Forth as its destination? And if not - would it be wrong to compile
> code into sequences of Forth definitions and words? Why?


FORTH proper is generally not used.


however, loosely FORTH-like backends are, actually, fairly common...


for example, I use a backend which was, to some extent, inspired by
PostScript...


likewise, both the Java VM, and .NET VM, use a RPN-based bytecode formats.




> Maybe I'm just looking too "foolish Forthy" into this topic. At least,
> it's a simple, stack-based virtual machine which is needed to execute a
> program in a particular (maybe self-defined) lower level language a
> compiler compiles to.
>
> Is my question only a different look on a well-known and widely used
> philosophy of program execution, or is it quite legitimate?


there are varying levels of "FORTH-ness" in various RPN-based backends...


for example, JBC (Java ByteCode) and MSIL/CIL (the bytecode used in .NET),
are both based around RPN.




granted though, these variants typically make "ammends" to allow efficient
compilation, for example, it is common to use direct labels and jumps, and
to restrict the way the stack is used.


for example:
in JBC, both the source and destination of a jump are required to have the
same stack layout.


AFAIK, MSIL does not allow a jump with items still on the stack (everything
needs to be forced out to variables). (I may be wrong on this point).




my IL, OTOH, while still not allowing items to remain on the stack, makes
use of a "union" feature to allow merging stack-items (from several control
paths) together into a single target stack item (basically, it is analogous
to the phi operation in SSA).


however, I may eventually begin either to phase out this approach (using a
temporary variable instead), or I "could" implement the JBC approach, but
this is less likely. I have worked out "better" ways to approach JBC, as
discovered in a recent effort to compile JBC to C, where most likely it
would be first compiled into TAC (AKA: 3-Address form) or SSA form... (from
the design, SSA would likely be "easy", but I actually use plain TAC, as
this maps better to C).




note that full unrestricted RPN (such as is found in plain Forth or
PostScript) would be difficult to work with or oprimize, and would place
notable restrictions on the way the code may be compiled or run... (in
particular, it would likely require the use of multiple stacks, as well as a
custom calling convention).


by restricting the ways the stack is used, however, allows efficient
compilation (but, at the cost of disallowing certain kinds of
constructions).




other comments:
RPN is a very convinient representation for upper-end compiler output, and
is also a relatively powerful, simple, and general-purpose representation.


however, it is not the ideal representation for (directly) producing
compiled code (at least, with current processors and calling conventions),
which demands a different model.


as a simple example, go and look at the SysV / AMD64 calling convention, and
an RPN-based codegen, and see if you can spot the problem...




a more abstract form of this model is SSA, however, explicit use of SSA is
something I have not been able to do effectively thus far. however, SSA'isms
have steadily been working their way into the rest of my compiler, such
that, although the input itself remains as RPN, RPN plays a diminishing role
in most of the actual code-generation process (basically, the RPN aspect
becomes increasingly virtualized).


so, I have yet to be able to make a "leap", but the transitions happen a
step at a time...




as I see it, however, RPN will likely remain a convinient representation for
producing upper-end compiler output, which may keep it as being a reasonable
representation, even despite SSA-form being used for the actual code
generation.


IME, RPN can typically be produced by using a straightforwards process to
unwind the ASTs, whereas directly producing SSA from ASTs would seem to
require "a little more work"... (and, also "some algo" which I am not
presently able to imagine...).


hell, maybe RPN has bent my mind somehow that I can't really see "the SSA
way of doing things", I don't know...




similarly, RPN is likely to be a much better general representation for
things like portable bytecode, ... than would be something like SSA (where
somehow, I suspect the exact SSA-form representation of a program is likely
to vary somewhat with the specific design of the codegen, whereas the RPN
form can likely remain far more generic...).


or, in other words, for a portable IL, a JBC or MSIL-like representation may
well be better than, say, an LLVM-like representation...


as well, it seems to be not "that" difficult to unwind RPN into SSA-form...




but, then again, maybe other people know some things I don't...




>
> Thanks for all replies in advance!



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.