Related articles |
---|
Re: C code .vs. Assembly code for Microcontrollers/DSPs ? hbaker@netcom.com (1996-03-08) |
Re: C code .vs. Assembly code for Microcont chase@centerline.com (1996-03-11) |
From: | chase@centerline.com (David Chase) |
Newsgroups: | comp.compilers |
Date: | 11 Mar 1996 17:48:17 -0500 |
Organization: | CenterLine Software |
References: | 96-03-068 |
Keywords: | optimize, arithmetic |
max@gac.edu (Max Hailperin) wrote:
> An even bigger problem, in my experience is that C/C++ are built on
> the strange notion that when you operate on two n-bit numbers, you
> get an n-bit result, even when multiplying. This doesn't make a
> whole lot of mathematical sense, and moreover the processor
> architects have in my experience always gotten it right
hbaker@netcom.com (Henry Baker) writes:
> I agree with you here. I've been fighting this one for years. It's
> utterly amazing that a language like C that has been used for so many
> 'embedded' type programs has never standardized a way to use
> 'efficient' multiple precision arithmetic.
Seems to me it would make more sense to just say what you mean, in the
language (such as it is) and see how the compilers deal with it. For
instance, it's not hard to say "64-bit result of two 32-bit numbers"
assuming your compiler has ways to say 32-bits and 64-bits:
unsigned long long mul_32_by_32_giving_64
(unsigned long x, unsigned long y) {
return (unsigned long long) x *
(unsigned long long) y;
}
fed to a modern compiler, this turns into:
.global mul_32_by_32_giving_64
mul_32_by_32_giving_64:
/* 0x0000 */ save %sp,-96,%sp
/* 0x0004 3 */ or %g0,%i0,%o0
/* 0x0008 */ call .umul,2 ! Result = %o0
/* 0x000c */ or %g0,%i1,%o1
/* 0x0010 */ or %g0,%o0,%i1
/* 0x0014 */ ret
/* 0x0018 */ restore %g0,%o1,%o0
There we see it using the portable-to-machines-lacking-hardware-
multiply 32x32-giving-64 library routine. Let's tell it to use
the hardware multiply (-xcg92):
.global mul_32_by_32_giving_64
mul_32_by_32_giving_64:
/* 000000 3 */ umul %o0,%o1,%o1
/* 0x0004 */ retl
/* 0x0008 */ rd %y,%o0
So, I think this problem, is not such a big problem.
That was SparcCompilers release 3.0.1.
And, giving equal time to free software, let's see how gcc 2.7.0 does.
Using "-mv8" ("machine" with all the "version 8" instructions) we get:
.global mul_32_by_32_giving_64
.type mul_32_by_32_giving_64,#function
mul_32_by_32_giving_64:
umul %o0,%o1,%o1
rd %y,%o0
retl
nop
Looks like they figured this one out, too. (Oh, but there's an extra
nop. Tsk, tsk, tsk.) For the other (no hardware multiply) case they
call some other routine that looks like it performs a 64x64 into 64
multiplication. So there's still work to be done, but not much, and
clearly compiler writers are aware of this idiom (and for all I know,
the 64x64 into 64 software routine could contain a check for words
full of zeros).
Of course, this does depend on compiler support for "long long", but
that shouldn't be a problem, given the wide availability of gcc (has
that been ported to NT yet?)
The carry bit is a bit more of a pain. Everyone I know who's thought
about this would love it if a compiler could automatically optimize
multi-word additions into a tight loop with an "addxcc" (Sparc
instruction, meaning, add, with the carry bit, and set the carry bit
out as appropriate) in the middle, but that's not something easily
snagged with a simple expression pattern -- if you think about it, the
compiler has to figure out that a particular "variable" will only
contain the values zero and one, and that it would be very convenient
to register allocate that into the carry bit. That, and there's this
problem (on some machines) with also needing the carry bit to figure
out whether to go around the loop again or not.
Speaking for myself,
David Chase
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.