Related articles |
---|
ANDF - some information "CCF::PEELING"@HERMES.MOD.UK (1993-11-18) |
Newsgroups: | comp.compilers |
From: | "CCF::PEELING"@HERMES.MOD.UK |
Keywords: | UNCOL |
Organization: | Compilers Central |
Date: | Thu, 18 Nov 1993 10:48:00 GMT |
There have been occasional references to ANDF on Usenet. There have been
some common questions about ANDF's capabilities and questions about
availability of documentation and software. I can provide some answers to
the questions and a source for documentation.
I am the TDF project manager at DRA (Defence Research Agency) at Malvern
in the UK. We were selected by the OSF's RFT (Request for Technology)
process as the technology supplier for ANDF. Our technology is called TDF
(TenDRA Distribution Format - where TenDRA is the trademark we apply to
all our software that implements ANDF).
There are three common comments about ANDF:
1) ANDF will fail for the same reasons as UNCOL and pcode did.
2) Run-time performance of applications compiled through ANDF will be
unacceptably slow (because of lowest common denominator code).
3) who needs an ANDF anyway? - e.g. CD-ROM solves the distribution
problem.
I will address these issues in order:
1) Of course DRA was aware of the UNCOL and pcode experience. TDF was
designed to avoid the pitfalls inherent in these approaches. For full
details I would refer the reader to an excellent paper by Dr Stavros
Macrakis at OSF ("From UNCOL to ANDF"). For access to OSF's FTP contact
Sandi Caldwell (well@osf.org). You can join OSF's emailing list by
emailing Rich Ford at OSF on andf-tech-request@osf.org . A very brief
synopsis of this paper is that firstly there are many more recent
experiences with multi-lingual ILs than UNCOL and pcode, and that compiler
technology and computer capabilities have advanced very considerably since
UNCOL and pcode were designed. Secondly, the TDF specification has many
novel aspects to suit it for its task: it is a tree structured IL based on
linguistic (not machine) abstractions - the design aim was to retain all
the information needed for code optimisation techniques but to discard
syntactic sugar that would aid reverse engineering (Stavros Macrakis has
also written a paper about reverse engineering ANDF). This approach is key
to the ability to write state-of-the-art code generators.
2) performance issues:
The approach adopted in TDF ensures that the information needed for code
optimisations is not lost so that the theoretical "penalty" of having an
ANDF is largely avoided. In addition the performance of compilers is also
an economic issue - how much effort is the compiler implementor willing to
fund i.e. how far into the region of diminishing returns can the compiler
implementor afford to go? Speaking solely for DRA's TenDRA installers the
economic benefits of ANDF far outweigh any (small) theoretical
disadvantages. We have examined the outputs of most high quality C
compilers available for all the main CISCs and RISCs and have built up a
large knowledge-base of optimisation techniques (plus a few home-brewed
ones). We have also invested heavily in a framework for TDF to TDF
optimisations (both universal and machine-class specific) so that as much
as 70% of the code of an installer is shared with at least one other
installer. Quoting a couple of our latest figures:
On Version 14.1 of SVR4.2 using the native ccs 2.0 as comparison called as
cc -O -Xa (to get the best optimisations) on a KAMCO 486/50, our SPEC
(using 1.2B - we have only just bought the latest version of SPEC) figure
is 8% better.
On Ultrix 4.2A using a DECSystem 5100 against cc -O2 our figure is 3%
better (we cannot get all of SPEC through cc -O3, but substituting the O2
figure for the test that crashes and comparing against our compiler using
its multi-file optimisations gives us a figure 1% worse).
Quoting performance figures is *very* suspect (I did not mention our SPARC
figures because we have not compared with the latest gcc or the unbundled
SUN compiler). They vary so much depending on whether you have the latest
versions of the comparison compilers, and are significantly affected by
exactly which version of the processor chips you are using (e.g. upgrading
from Motorola 68030 to 68040 made a big difference), and can be
significantly affected by the cache size in the box you are using etc..
When OSF reruns our SPEC figures they are seldom the same (sometimes
better, sometimes worse). Perhaps more interesting is that OSF have run
two real applications (Informix's Wingz and the Oracle RDBMS) through our
system and compared performance with the products as shipped and can
detect no significant degradation going through ANDF.
I should stress that the TenDRA installers are a proprietary
implementation of a fully open standard. We are marketing the TenDRA
technology because keeping state-of-the-art performance for all the main
architectures (PowerPC, PA and Alpha implementations are underway) is
horribly expensive! Other organisations have had no problems adapting
other (non DRA) code generator technology for use with ANDF - OSF has
built a bridge to gcc, and an Italian company called Etnoteam, had no
trouble building an Alpha installer using their in-house compiler
technology. To help other organisations write ANDF tools independently of
DRA and to avoid unnecessary and potentially unhelpful duplication, we
have put a substantial amount of software in the Public Domain: this
includes TDF readers and writers which are automatically generated from a
machine understandable representation of the TDF specification, the TDF
linker (which links token declarations to token definitions), tnc (an
ASCII to TDF and TDF to ASCII assembler / pretty printer), pl-tdf (a high
level assembler for producing hand-crafted token definitions) and tspec (a
tool for turning API definitions into architecture-neutral headers).
All our performance figures are for C. There are no complete compilers for
other languages to quote figures for but implementations are underway for
Fortran (77 and 90), C++, Ada 9x and Dylan; also MicroFocus have helped us
evaluate Cobol issues. A number of minor tweaks have been added to ANDF to
ensure that these other languages can have run-time performance comparable
to native compilers - progress on Fortran, C++ and Ada is sufficient to
give us considerable confidence that the TenDRA installers can be modified
to handle them without loss of performance (Fortran, for example, will
require additional optimisations to be added to the TenDRA installers).
By way of an aside - TDF looks to be ideal for use in the implementation
of portable compilers / compiled-code simulators for niche languages, e.g.
simulation languages such as VHDL or Verilog in ECAD or Chill and SDL in
telecoms, system simulation languages (e.g. CACI's Modsim), niche
implementation languages (e.g. in telecoms and ECAD), legacy programming
languages (PL1, RTL2, ....), etc.
3) distribution, who needs it?
The first thing to say is that ANDF offers considerable advantages over
CD-ROM. A CD-ROM does not help with yet-to-be-released systems. Multiple
copies may be desired on CD-ROM for different variants or implementations
of the same architecture (e.g. 32 & 64 bit versions, differing instruction
scheduling rules, or to avoid emulation traps for complex instructions on
low end implementations). Then there is the possibility of different
object file formats (e.g. for UNIX and NT). CD-ROMs are not of infinite
size, who falls off the end? The logistics of putting a multi architecture
CD-ROM together will be pretty horrible. A fuller discussion can be found
in a paper by Stavros Macrakis called "Distributing software for multiple
platforms: Virtual binary vs. multiple binary".
The tools we have built for ANDF have benefits even if an ISV still
intends to ship non-ANDF versions of their software (on a CD-ROM or
whatever). The non-distribution benefits of using ANDF tools will in the
short term be more visible than ANDF's use for distribution. Distribution
remains an important but longer term objective. The two immediate uses of
ANDF that we believe will precede software distribution are (1) for API
definition and checking, and (2) as aid to writing portable applications.
Firstly, to design ANDF we had to solve an important API issue. System C
headers are implementations of API specifications, containing a complete
program description of an API which completely defines those parts of an
API specification which are opaque in the API definition (e.g. the type of
FILE in ANSI C, which is a parameter to fprintf, is an opaque
datastructure and can be any particular datastructure an implementation
wants to choose). For software distribution, standard C system headers
cannot be used because of the implementation details contained within
them. The solution adopted in TDF is to define architecture-neutral header
files using an interface definition language contained within pragmas.
These architecture neutral headers faithfully reflect the written API
specifications. Because the architecture neutral headers contain no
implementation-specific information, DRA's ANSI C producer checks that
application programs do not stray outside the written API specification.
Parts of the API which are not fully defined in the API specification are
passed by the ANSI C producer into the TDF as "tokens". Tokens are
placeholders in the TDF tree that can be substituted later (usually on the
target machine) by an appropriate token definition (TDF sub-tree). In the
API case token definitions are supplied on each target which supply the
implementation dependent program details (e.g. the exact type of FILE).
Not wishing to create token definitions by hand they are automatically
created from a comparison of the architecture neutral headers and the
system headers for the machine in question. As part of this process any
discrepancies between the system headers and the API specification (as
described by the architecture neutral headers) are reported. It can be
seen that TenDRA performs (static analysis) conformance checks of both
applications and system headers against API specifications. This gives API
definitions "teeth" in a way they have not had before. This property of
ANDF is causing a lot of interest in standards defining organisations.
The second benefit of ANDF that is independent of distribution is as an
aid to designing and implementing portable applications. The API defining
features of ANDF can be used to replace conditional compilation with an
application-specific API. The advantage of defining an API is that the
portability requirements of an application are cleanly defined - reducing
porting effort and allowing a communication of porting requirements
between ISVs and System Vendors. In addition the ANDF scenario makes it
very natural for our ANSI C producer to compile in the absence of the
target machine's architectural properties. As a result it was
straightforward to build powerful portability checks into the ANSI C
producer. As a result ISVs can derive considerable benefits from using
ANDF tools, even if for the time being products are still distributed in a
non-ANDF format. In doing so ISVs will find that many of their testing
procedures no longer discover errors - building confidence that software
distribution in ANDF is viable.
I am pleased to say that ANDF is now very well documented. Much of this is
available from OSF or directly from me (all DRA documentation is available
free of charge), I enclose an order form at this end of this article. I
hope that those Usenet correspondents interested in ANDF will take the
opportunity to check out the claims I have made here.
Software is currently available to members of OSF or from DRA for
evaluation by potential licensees. We had been reluctant to make an
unrestricted release of the software to the research community until the
software was of near product quality reliability and the documentation was
in good shape. Both of these requirements have now been achieved and DRA
is looking to fund an external organisation to provide support for a
release of DRA's software to the research community. The terms of this
release would be for any non-commercial use and it will either be
distributed free or for a nominal charge to cover distribution costs. Any
organisation interested in bidding to provide such support should contact
me directly.
I hope the information in this article has been helpful. If anyone has any
further questions I will try to answer them.
Nic Peeling
TenDRA Project Manager
DRA Malvern
email: peeling@hermes.mod.uk (internet)
peeling@dra.hmg.gb (alternate internet address)
-------------------------------------------------------------------------------
DOCUMENTATION ORDER FORM
Documents available:
1. TDF Specification (62 pages) - describes the specification chosen
to form the basis of OSF's ANDF technology.
2. Guide to the TDF Specification (50 page) - a detailed technical
guide for implementors.
3. TDF Facts and Figures (8 pages) - details the performance of
current implementations.
4. TDF and Portability (39 pages) - describes the relationship between
TDF, portability and APIs. Contains worked examples.
5. X11, An Example in API Specification (21 pages) - a detailed worked
example of API specification.
6. The C to TDF Producer (67 pages) - a detailed technical guide.
7. Frequently Asked Questions about ANDF (11 pages)
Please provide the following information:
name:
organisation:
email address:
snail mail address:
documents wanted (using numbers 1 - 7):
email versions required?:
documents are available in compressed PostScript and can be sent using
either btoa or uuencode, please specify required mechanism:
snail mail documents required? (for economy we can only send one
copy):
do you wish to be added to our mailing list for occasional additional
information?:
If you are willing to provide the following information we would find
it useful to put in our records:
your particular interest in ANDF:
phone number:
FAX number:
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.