Related articles |
---|
Compile HLL to microcode on VLIW - possible? sberg@camtronics.com (1996-04-02) |
Re: Compile HLL to microcode on VLIW - possible? pardo@cs.washington.edu (1996-04-06) |
Re: Compile HLL to microcode on VLIW - possible? kim@jrs.com (Kim Whitelaw) (1996-04-08) |
Re: Compile HLL to microcode on VLIW - possible? aaedonnelly@voyager.net (Donnelly) (1996-04-10) |
Re: Compile HLL to microcode on VLIW - possible? narad@nudibranch.asd.sgi.com (1996-04-10) |
Re: Compile HLL to microcode on VLIW - possible? preston@tera.com (1996-04-11) |
Re: Compile HLL to microcode on VLIW - possible? narad@nudibranch.asd.sgi.com (1996-04-13) |
Re: Compile HLL to microcode on VLIW - possible? preston@tera.com (1996-04-16) |
Re: Compile HLL to microcode on VLIW - possible? krste@ICSI.Berkeley.EDU (1996-04-18) |
[10 later articles] |
From: | Kim Whitelaw <kim@jrs.com> |
Newsgroups: | comp.compilers,comp.arch |
Date: | 8 Apr 1996 23:22:57 -0400 |
Organization: | JRS Research Labs |
References: | 96-04-013 |
Keywords: | architecture |
Scott A. Berg wrote:
>
> The recent LONG thread in comp.complers on the relative merits of
> using a HLL (FORTRAN, C, etc.) verses assembly language got me
> wondering -
No one has replied to this message, so I'll put in my two cents :).
> Is it possible to create a computer where the HLL gets compiled into
> processor microcode, fully optimized, with some amazing increase in
> speed?
Yes, not only is it possible, but it has been done. Work has been
going on in this area since the early 1980s. Look at some old
Microprogramming conference proceedings for some of the papers and
references to other papers. The VLIW concept is a derivative of this
early work.
> Stream of consciousness analysis -
>
> 1) Compiler technology has now advanced to the point where complex
> optimizations such as "these consecutive lines of code don't share any
> variables so they can be done in parallel" are possible. In fact,
> this is has long been done in some highly parallel processor
> architectures, but they never took it down to the microcode level.
These optimizations have been applied at the microcode level. The
optimization was called microcode compaction or microcode scheduling.
The microoperations are modeled as using resources (data paths and
microword fields). Two microoperations can be scheduled in parallel
as long as the data dependencies are not violated and there are no
resource conflicts. Initially this optimization was only applied for
straight-line code segments (no branches), but it was quickly
generalized to the multiple blocks and loops. The most effective
global optimization strategy was invented by Fisher and is called
Trace Scheduling (It was his and his student's work that popularized
the VLIW concept). There are also important optimizations that can be
applied to loops (loop-pipelining).
> 2) Microcode programs are usually (always?) embedded in a PROM area of
> the processor, but that area could probably be switched to RAM and
> reloaded with code as needed. ( as a side comment, does *anybody* use
> hardwired logic in their processors anymore, or is it all microcode?)
Many of the popular computer architectures (e.g. VAX780) were
implemented in microcode. The VAX780 provided a writable control
store that allowd a very experienced user to define his own
"instructions" by writing his own microcode. Many proprietary DSP
chips were originally microprogrammed, since this was the easiest way
to obtain the desired performance for key algorithms.
> 3) VLIW architectures allow for "lots" of parallelism, but is it *really*
> exploited?
It's difficult to exploit the full parallelism of a VLIW architecture
without resorting to some algorithm-remapping tricks; there has been
some work done in this area, but I don't know if it has been generally
effective.
(: --Self promotion alert! -- :)
At JRS, we have developed quite a few Ada to microcode compilers. The
target processors were typically horizontally microprogrammable DSPs.
We also generated compilers for several SIMD processors, a MIMD
processor, and most recently for a Systolic Cellular Array Processor.
For this processor we achieved a thruput of 2.6 gigaflops for a 20 tap
FIR filter (80% of peak processor performance). The compiler is
table-driven, so it can easily be retargeted to various architectures.
Builtin-functions and operators that map directly to microcode can be
easily be added to take advantage of special hardware (e.g., FFT bit
reversal address generation) without any performance penalty.
Kim Whitelaw
kim@jrs.com
JRS Research Labs
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.