Re: Compile HLL to microcode on VLIW - possible?

narad@nudibranch.asd.sgi.com (Chuck Narad)
13 Apr 1996 23:02:58 -0400

          From comp.compilers

Related articles
Compile HLL to microcode on VLIW - possible? sberg@camtronics.com (1996-04-02)
Re: Compile HLL to microcode on VLIW - possible? pardo@cs.washington.edu (1996-04-06)
Re: Compile HLL to microcode on VLIW - possible? kim@jrs.com (Kim Whitelaw) (1996-04-08)
Re: Compile HLL to microcode on VLIW - possible? aaedonnelly@voyager.net (Donnelly) (1996-04-10)
Re: Compile HLL to microcode on VLIW - possible? narad@nudibranch.asd.sgi.com (1996-04-10)
Re: Compile HLL to microcode on VLIW - possible? preston@tera.com (1996-04-11)
Re: Compile HLL to microcode on VLIW - possible? narad@nudibranch.asd.sgi.com (1996-04-13)
Re: Compile HLL to microcode on VLIW - possible? preston@tera.com (1996-04-16)
Re: Compile HLL to microcode on VLIW - possible? krste@ICSI.Berkeley.EDU (1996-04-18)
Re: Compile HLL to microcode on VLIW - possible? andy@Xenon.Stanford.EDU (1996-04-18)
The shortest way with programs (was Compile HLL to microcode) dlmoore@ix.netcom.com (1996-04-18)
Re: Compile HLL to microcode on VLIW - possible? doconnor@sedona.intel.com (1996-04-20)
Re: Compile HLL to microcode on VLIW - possible? WStreett@shell.monmouth.com (1996-04-29)
[7 later articles]
| List of all articles for this month |

From: narad@nudibranch.asd.sgi.com (Chuck Narad)
Newsgroups: comp.compilers,comp.arch
Date: 13 Apr 1996 23:02:58 -0400
Organization: Silicon Graphics, Inc. Mountain View, CA
References: 96-04-013 96-04-059 96-04-068
Keywords: architecture

>> Is it possible to create a computer where the HLL gets compiled into
>> processor microcode, fully optimized, with some amazing increase in
>> speed?


>Not only is it possibly, but we did that [back when]


preston@tera.com writes:
> All these answers I've been reading are missing the point. The
> original poster wants an amazing increase in speed. It's not going to
> happen these days. Microcode was generally made obsolete by the
> invention of the separate instruction cache.


I'm not sure I buy that connection. The point to microcode was the
ability to fit the desired high-level instructions to the underlying
hardware, which in the case of VLIW had lots'o'units to schedule.
These days the approach is to provide hardware scheduling of these
units (superscalar) while keeping the instruction set constant.


> For general-purpose machines, compiling into microcode was always a
> problem because of the need to handle context switches. For
> special-purpose machines (e.g., the Culler), people could do a nice
> job for certain tight loops.


hmmm...could you explain why you think the Culler-7 was a
special-purpose machine? Our model was intended to be as
general-purpose as any other unix box on the street at that time,
while handling scientific code better. the fact that independent data
flows could be mapped to the multiple busses, registers, and vector
memories concurrently (either at the ISA level or with custom
instructions) allowed the machine to (in some sense) vectorize
independent blocks of non-vector code, and to take advantage of data
scheduling (prefetching) in ways that other machines at the time
simply could not.


We had some amount of microstore dedicated to the standard instruction
set, and some allocated for custom instructions. The structure of the
culler-7 included two microstores; the first was the instruction
decode, which provided the microword for the first clock of the
instruction and a jump point to the memory that contained the rest of
the instruction (if the inst. had more than one micro-cycle associated
with it.) The OS could choose to load all but the first cycle wherever
it wanted to in the second memory, so at context switch time only one
microword (in the XDECODE memory) needed to be carried with the
process per custom instruction (of which there were generally only one
or a few.)


By the way, the machine also had a separate instruction memory, which
was virtual and thus became a software-managed instruction cache.
Through some extreme cleverness the PMEM was able to provide an
instruction to each of the A and X units every clock (A was the scalar
machine dedicated to scalar computation, address computation, and data
scheduling, while X was the portion that provided FP, vector memories
etc.) PMEM, as with most of the machine, carried state for two
contexts simultaneously so that processing and staging could be done
concurrently (staging/ de-staging was hidden in the background while
another context was running.)


I'm afraid I'm not grasping the connection between microcode and
I-caches that you're suggesting (at least in this context); could you
expand on that?


> But those loops are just the ones that respond so well to
> instruction caches. In cases where an instruction cache isn't
> adequate, the compiler isn't going to be able to help much anyway.
>
> For amazing increases in speed, you need to worry about the data,
> not the instructions.


Well, to reiterate, the key to performance in the culler-7 (and in any
VLIW or superscaler architecture) is partly in keeping the data
scheduled, and partly in keeping as many functional units occupied as
possible. It seems that you are concerned with squashing vertical
bubbles, while I'm talking about eliminating horizontal bubbles in
unit scheduling.


cheers,
chuck/


--------------------------------------------------------------------
| Chuck Narad -- diver/adventurer/engineer |
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.