Related articles |
---|
compilers for DSP processors vanpraet@imec.be (1992-12-21) |
Re: compilers for DSP processors jeff@dsp.sps.mot.com (1992-12-22) |
High Quality DSP Compilers syiek@tartan.com (1993-01-04) |
Other Languages syiek@tartan.com (1993-01-05) |
Newsgroups: | comp.compilers |
From: | syiek@tartan.com |
Organization: | Compilers Central |
Date: | Mon, 4 Jan 1993 22:43:42 GMT |
Keywords: | DSP, optimize, comment |
References: | 92-12-094 92-12-097 |
Response to Johan Van Praet and Jeff Enderwick
(discussion of DSP compiler quality).
Tartan Inc. is a supplier of Ada development systems for real-time
embedded systems. We produced the first Ada compiler for a DSP (the
TMS320C3x in '90), and currently also have TMS320C31 and TMS320C40
compilers.
As you may already know, the use of Ada is mandated by the U.S.
Department of Defense (DOD) for all defense-related programs. Use of
assembly is usually limited to 5% or less of a total application.
Most real-time DSP applications require extremely tight code.
Traditionally this could only be achieved through careful assembly
language coding. Thus it is not unusual for DSP users to reject
high-level language compilers, (sometimes before even trying them),
because of this previous experience.
Fortunately, the DOD mandate makes it very hard to reject an Ada compiler
without first trying it out. As a result, the Tartan Ada DSP compilers
are often benchmarked against other compilers and hand coded assembly.
The results have been surprising (but not to us!): Tartan Ada consistently
outperforms other DSP compilers and is usually only slightly slower than
hand coded assembly. Indeed there are cases where the compiler has
produced code that runs faster or is smaller than hand coded assembly.
There are published accounts of several of these benchmarking efforts.
Two are:
P.K. Lawlis and T.W. Elam "Ada Outperforms Assembly: A Case Study"
Proceedings of TRI-Ada '92
Orlando, Florida 1992.
Ralph E. Crafts, "Ada vs. C for the DSP - Advantage Ada",
Ada Strategies, Vol 5. No. 4, April 1991.
The Lawlis paper discusses an incident where the compiler was used to
produced a version of an algorithm that was the same size as hand coded
assembly, but ran twice as fast.
There are many technical reasons why the compiler performs as well as it
does. I refer you to the following papers:
D.A. Syiek "Challenging Assembly Code Quality,"
Proceedings of the International Conference on DSP Applications
and Technology,
Berlin, Germany, 1991.
D.A. Syiek and D. Burton
"Optimizing Ada Code for the TI SMJ320C30 Digital Signal Processor",
Proceedings of the International Conference on DSP Applications
and Technology,
Brussels, Belgium, 1990.
======================================================================
Some specific responses to Johan Van Praet's message:
>What causes this factor of 5 on the code size?
It is speed that usually matters for my customers. However, the Lawlis
paper describes how the Ada compiler was also used to produce a version
that was twice as small as the hand-coded assembly. This is a factor of
0.5 !
>The instruction set of a DSP processor does not lend itself
>to conventional compiling techniques ?
All machines are unique to some degree. A good compiler technology will
accommodate this uniqueness across a wide variety of machines. We had few
real problems finding ways to generate good TMS320C3x/C4x code. We expect
few problems when (and if) we tackle other DSP architectures in the
future.
>the High Level Languages are not useful for DSP?
>not enough parallelism in C (too difficult to extract the parallelism)?
>too many possible constructs in C ? (a subset of C is better)
>non-procedural languages as e.g. Silage are better ?
1. There are many that believe that C is NOT an example of a modern high
level language. I will pretend that you use "C" as a meta-variable for
high-level language (HLL).
2. You can kill performance in ANY language with bad coding style.
Conversely, you can tune code easily in a high level language, (but
not always the same way for all languages, compilers and targets).
3. Parallelism at what level?
Are you building a multiprocessor and want your algorithm magically
distributed across the system?
Or do you have asynchronous functional units in your CPU?
Is the instruction pipeline visible at the user level?
Are there multiple operations per instruction?
4. The Tartan TMS320C3x/C4x Ada compiler is able to "fold" loops (make
software pipelines) and generate parallel instructions. The parallel
instructions are also used in many pre-canned sequences and
library functions invoked automatically where applicable. Finally, the
parallel instructions may be created in unusual places through an
algorithm we call "threads" that is too combinatorially complex to
duplicate by hand.
>no possibility of using all the provided tricks for the DSP
>processors ?
Almost all good compilers contain language extensions, compiler built-ins,
or libraries that allow you to touch the hardware features no matter what
the input language. For example, the Tartan compiler contains constructs
for using the circular and bit-reversed addressing features of the
TMS320C3x/C4x.
> no global ordering and scheduling of the generated code ?
At the machine code level, we schedule code to avoid pipeline delays both
for delayed branching and for all other pipeline locks. At a higher
level, scheduled tasking and interrupt handling usually deal with the
remainder of the the dynamic scheduling issues. Automatic multiprocessor
scheduling is usually not a requirement of embedded systems. Tasks are
statically divided up amongst the resources.
>no or not enough use of the low-overhead-loop facility of a
>processor as the "DO" for Motorola and the "RPT" for Texas
>Instruments processors ?
The Tartan TMS320C3x/C4x compilers automatically use the RPTS and RPTB
instructions where appropriate. In fact there are about a dozen
specialized looping constructs the compiler chooses from when building
iteration loops, depending on the nesting depth, iteration count,
resources available ... etc.
>no use of special addressing ? (e.g. in circular buffers)
As already stated, the Tartan compiler contains constructs for doing
circular addressing and bit-reversed addressing.
>no use of special block data moves ?
The Tartan compiler uses fast block move sequences whenever appropriate.
>I also know of three code generation approaches that would generate more
>optimal code :
Now you know of a fourth. Furthermore it is mature and off-the-shelf and
supports a standardized high-level language.
>What is their quality on real life examples ?
Tartan has a large customer base - it seems like just about every company
in the defense industry owns a Tartan compiler. Our DSP compilers have
been used to build many delivered systems. The feedback we get from our
customers is VERY positive.
So what does this mean to the commercial (non-DOD) world:
1. You COULD start using Ada. If you have a large application, there
is much evidence that the high cost of entry will pay for itself
several times over. However, much of the commercial world:
a. Builds small systems on rather primitive fixed-point DSP processors
for which good compilers do not exist.
b. Refuses to consider Ada and will not even study the cost-benefit.
2. Tartan recently formed a commercial division in order to leverage our
existing technology into that market. There already two products
designed to boost C performance: FasTar (high speed trig function
library) and FloTar (double precision floats for the C3x/C4x). Look
for more ambitious products in the coming year.
======================================================================
Specific responses to Enderwick's message:
I agree with much of what you say. The difference between your
perspective and mine is that my customers are generally creating
lower-production-volume systems using the newer floating-point chips.
These systems are large (often measured in 100K line increments) and have
such long life expectancies that 100% assembly code is totally impractical
(even if the DOD would allow it).
In a large system, the 90-10 rule tends to hold up pretty well (90% of the
time is spent in 10% of the code). A small amount of time spent tuning
the key 10% of the Ada code usually gets the algorithm to perform in the
required time. The compiler contains assembly code insertion capability
(with symbolic reference to Ada variables) as well as assembly code
interface capability. These can be (and are) used as a last resort and
usually on less than 5% of the total code.
The register selection problem you mention shows up on the C3x/C4x as
well. Since loop folding and other parallelization techniques are done
AFTER the code is generated, registers must often be re-assigned to get
the right ones. The compiler does not always come up with the ideal
solution (but we are working on it!). However, since most loops employing
parallel instructions are 5 or 6 instructions long, its not really a big
deal to use a machine code insert.
Your comment about the referenced schemes being "library goop-together"
matches my limited understanding of them as well. However, as you also
note, this is a powerful approach. Tartan uses this approach in a limited
way by providing some amount of library routines, often with high speed
interfacing supported directly from the code generator of the compiler.
For example, complex numbers are supported in this way, as are mixed
complex/real operations and complex vectors. It is likely that we will
expand on this as time goes by.
David A. Syiek
Tartan Inc., Monroeville, PA 15146
(412) 856-3600, FAX:856-3636
syiek@tartan.com
[Does Tartan have DSP compilers for other languages or just Ada? I can
imagine that there'd be a market for Fortran, considering all the numeric
Fortran code there is, or maybe C++ for people who want a cool language.
-John]
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.