Related articles |
---|
extensible compilers mtrofimov@glas.apc.org (1998-03-18) |
Re: extensible compilers ichisugi@etl.go.jp (ICHISUGI Yuuji) (1998-03-20) |
Re: extensible compilers pardo@cs.washington.edu (1998-03-20) |
Re: extensible compilers bruce@cenderis.demon.co.uk (Bruce Stephens) (1998-03-22) |
extensible compilers mtrofimov@glas.apc.org (1998-03-22) |
Re: extensible compilers ndc@alum.mit.edu (N. D. Culver) (1998-03-24) |
RE: extensible compilers vic@paragraph.com (Zhukov, Victor) (1998-03-30) |
Re: extensible compilers ndc@alum.mit.edu (N. D. Culver) (1998-04-03) |
From: | "N. D. Culver" <ndc@alum.mit.edu> |
Newsgroups: | comp.compilers |
Date: | 24 Mar 1998 22:37:55 -0500 |
Organization: | Atlantic Biomedical Engineering |
References: | 98-03-212 |
Keywords: | design |
I think that the way to do an extensible compiler/language is as
follows:
ASSUMPTION: a complete AST is retained after the parse and is
processed by:
0. A multi-pass front end that munches
on the AST decribed in 1. below and
emits something to a code generating
back end.
1. Invent a set of AST nodes and branches
that encompass all/most? of the semantics
embodied in the computer languages we know.
This would be a nice research project for
a PHD candidate. How many computer language
semantic categories have been produced in the
last 50 years? I would guess that there would
be less than 200 main node types.
2. Use a parser table generator that produces
tables that include encoded actions that emit
intermediate codes; Lrgen50 by Paul Mann comes
to mind.
3. Write a parser engine that will save up the
intermediate codes in a buffer and then proceed
to process the codes when the original source
is exausted. The grammar for the codes would have
been added onto the grammar for the language
so that the parser tables encode both languages.
It is trivial to separate the two by simply using
language illegal characters to identify the
intermediate codes.
4. Invent a set of codes that can tell the parser
engine to restructure and rename the AST nodes
so that they can be adjusted from the natural
form that the language grammer imposes into the
'canonical' form that the front end in 0. can
deal with.
5. After you get the grammar for the language working
you would add code emission actions to the productions
of the language. You would then write the grammar
for the codes (which could have productions that emit
codes ...) so that eventually the AST would be bent
into the 0. form. I suspect that most of the type
system could be built at this time.
6. Bingo, you have a very versatile system that
embodies ALL of the variability in one grammar
module so it should be reasonably extensible by
the learned or by the not so learned if they are
given some helpful graphical tools.
NOTE: The grammar that I am describing is not of the yacc or pccts or
JavaCC kind which embodies C code or 'object oriented' callouts in the
middle of the EBNF. IMHO that format is much too messy and hard to
read if you wish to accomplish step 5 above.
Norman Culver
[IMP72 did something like that 25 years ago. It used an Earley parser
so it could tolerate ambiguous syntax and let you add syntax on the
fly. Semantics were either expressed as macros or as calls to the
internal semantic routines. It worked, but had two big problems: one
was the "every program in a different language" problem, the other was
that it was very hard to do any optimization if you couldn't make any
assumptions about the source language. I suspect the optimization
problem would be more tractable now, but the syntax problem wasn't a
technical one. -John]
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.