Seprerating algorithms from implementations (long)

"Toby Sharp" <TSharp@Serif.com>
27 Aug 2000 22:25:36 -0400

          From comp.compilers

Related articles
Seprerating algorithms from implementations (long) TSharp@Serif.com (Toby Sharp) (2000-08-27)
Re: Seprarating algorithms from implementations (long) kusuma@eecs.berkeley.edu (Julius Kusuma) (2000-09-02)
Re: Seprerating algorithms from implementations (long) felix@anu.ie (felix) (2000-09-02)
Re: Separating algorithms from implementations (long) westin@graphics.cornell.edu (2000-09-07)
Re: Separating algorithms from implementations (long) mal@bewoner.dma.be (Lieven Marchand) (2000-09-07)
Re: Seprerating algorithms from implementations (long) rhutch7@kdsi.net (Rob Hutchinson) (2000-09-08)
Re: Separating algorithms from implementations (long) noelw@dai.ed.ac.uk (Noel Welsh) (2000-09-08)
[9 later articles]
| List of all articles for this month |
From: "Toby Sharp" <TSharp@Serif.com>
Newsgroups: comp.graphics.algorithms,comp.compilers,comp.dsp
Date: 27 Aug 2000 22:25:36 -0400
Organization: Compilers Central
Keywords: optimize, performance, question

For some time I have been reading books on computer graphics, studying
algorithms and idly contemplating how interesting it would be to have
enough time to write my own graphics engine / pipeline / applications.


Of course, if I had the time, it would have to be done properly. The
design stage is paramount in a robust and "future-proof"
application. Naturally, object-oriented design patterns come in handy,
within a C++ architecture. But in this setting, certain time-critical
portions of code, while small, may constitute 99% of the CPU time. For
example, polygon drawing, vector calculations, etc.


These time-critical sections need to be heavily optimised for
speed. After exploiting all a given algorithm's redundancy, critical
code can be hand-tuned in assembly language to squeeze out every drop
of performance. There's a very good reason for hand-crafting
time-critical code in ASM: the author knows far more about how the
algorithm works than does a compiler, and this extra knowledge allows
the author to do far more optimisation than just scheduling the
instructions efficiently.


So, a few days later, what was a few lines of C code is now many lines
of ASM code, but it runs in half the time and everyone's happy. But
then the inevitable happens. A new CPU becomes available, and all the
rules have changed. Has anyone ever switched from floating-point to
fixed-point and back to floating-point again, depending on which is
fastest at the time? Instruction times and latencies are subject to
change.


For example, multiplication is at least as computationally expensive
as addition, so (x + x) should be prefarable to (x * 2). But what
about (x * 3)? Whether 'tis best to use one multiplication here or two
additions depends entirely on the target CPU, and therefore is subject
to frequent review, along with many, many other issues. Suddenly, your
hand-crafted code is no longer optimal, so you must re-write it if you
still want it to be as fast as possible. This isn't what life should
be about.


At this point, most people would probably suggest leaving the code in
C and re-compiling when new compiler versions are released to reflect
new target CPUs. But this is not really satisfactory for critical
performance. C will use fixed-point or floating-point depending on
what you told it to use. It doesn't know about acceptable errors, nor
does it know anything about the nature or range of input data. For
example, it will never use alpha instead of sin(alpha) when alpha is
small; and rightly so, for no-one said it should. And as for
exploiting parellism? Well, Intel's C/C++ compiler claims automatic
vectorisation for using their SIMD extenstions, but personally (having
tried their compiler) I certainly wouldn't trust my inner loops with
it.


We might be able to solve this problem if there were some way to
describe algorithms apart from their implementations. Such a
description would need to be sufficiently concrete to allow an
implementation to be derived, but needs to be sufficiently flexible to
allow many different implementations. These could then be tested for
speed, or various trade-offs could be chosen on the basis of knowledge
of a target CPU. There would need to be a way to state assumptions on
the range and properties of input data. Knowledge of these properties
is what makes hand-tuned code able to be more efficient. Acceptable
errors could be defined where appropriate. Algorithms could be
described in ways which exploit parellelism, where appropriate, since
this is a strong direction for forthcoming CPUs.


Then a very, very clever compiler could spend a lot of time looking at
the algorithm description, assembling instructions into
implementations, testing sample data, and making changes as
appropriate. If successful, the compiled code should rival that of the
hand-tuned implementation. However, the algorithm doesn't change with
the CPU, so neither does its description. It can be recompiled later
to generate a new implementation that benefits from the features of a
new CPU.


The crux of the idea is to have a new language which describes
algorithms separately from implementations. The language also provides
a way to give extra information about supplied input and required
output, so that highly optimised code can be created. A compiler goes
into great detail to find the best implementation for the supplied
algorithm information.


So, has anyone done it? Suggested it? Any thoughts on whether
theoretically possible? Or practically feasible? All comments welcome
at this stage.
--
--------------------------------------
Toby Sharp, B.Sc. (Hons), G.I.M.A.
Senior Software Developer
Serif (Europe) Ltd
[There are certainly higher level languages than C. I would guess that
people have addressed this kind of question in Lisp. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.