11 Mar 1996 17:48:17 -0500

Related articles |
---|

Re: C code .vs. Assembly code for Microcontrollers/DSPs ? hbaker@netcom.com (1996-03-08) |

Re: C code .vs. Assembly code for Microcont chase@centerline.com (1996-03-11) |

From: | chase@centerline.com (David Chase) |

Newsgroups: | comp.compilers |

Date: | 11 Mar 1996 17:48:17 -0500 |

Organization: | CenterLine Software |

References: | 96-03-068 |

Keywords: | optimize, arithmetic |

max@gac.edu (Max Hailperin) wrote:

*> An even bigger problem, in my experience is that C/C++ are built on*

*> the strange notion that when you operate on two n-bit numbers, you*

*> get an n-bit result, even when multiplying. This doesn't make a*

*> whole lot of mathematical sense, and moreover the processor*

*> architects have in my experience always gotten it right*

hbaker@netcom.com (Henry Baker) writes:

*> I agree with you here. I've been fighting this one for years. It's*

*> utterly amazing that a language like C that has been used for so many*

*> 'embedded' type programs has never standardized a way to use*

*> 'efficient' multiple precision arithmetic.*

Seems to me it would make more sense to just say what you mean, in the

language (such as it is) and see how the compilers deal with it. For

instance, it's not hard to say "64-bit result of two 32-bit numbers"

assuming your compiler has ways to say 32-bits and 64-bits:

unsigned long long mul_32_by_32_giving_64

(unsigned long x, unsigned long y) {

return (unsigned long long) x *

(unsigned long long) y;

}

fed to a modern compiler, this turns into:

.global mul_32_by_32_giving_64

mul_32_by_32_giving_64:

/* 0x0000 */ save %sp,-96,%sp

/* 0x0004 3 */ or %g0,%i0,%o0

/* 0x0008 */ call .umul,2 ! Result = %o0

/* 0x000c */ or %g0,%i1,%o1

/* 0x0010 */ or %g0,%o0,%i1

/* 0x0014 */ ret

/* 0x0018 */ restore %g0,%o1,%o0

There we see it using the portable-to-machines-lacking-hardware-

multiply 32x32-giving-64 library routine. Let's tell it to use

the hardware multiply (-xcg92):

.global mul_32_by_32_giving_64

mul_32_by_32_giving_64:

/* 000000 3 */ umul %o0,%o1,%o1

/* 0x0004 */ retl

/* 0x0008 */ rd %y,%o0

So, I think this problem, is not such a big problem.

That was SparcCompilers release 3.0.1.

And, giving equal time to free software, let's see how gcc 2.7.0 does.

Using "-mv8" ("machine" with all the "version 8" instructions) we get:

.global mul_32_by_32_giving_64

.type mul_32_by_32_giving_64,#function

mul_32_by_32_giving_64:

umul %o0,%o1,%o1

rd %y,%o0

retl

nop

Looks like they figured this one out, too. (Oh, but there's an extra

nop. Tsk, tsk, tsk.) For the other (no hardware multiply) case they

call some other routine that looks like it performs a 64x64 into 64

multiplication. So there's still work to be done, but not much, and

clearly compiler writers are aware of this idiom (and for all I know,

the 64x64 into 64 software routine could contain a check for words

full of zeros).

Of course, this does depend on compiler support for "long long", but

that shouldn't be a problem, given the wide availability of gcc (has

that been ported to NT yet?)

The carry bit is a bit more of a pain. Everyone I know who's thought

about this would love it if a compiler could automatically optimize

multi-word additions into a tight loop with an "addxcc" (Sparc

instruction, meaning, add, with the carry bit, and set the carry bit

out as appropriate) in the middle, but that's not something easily

snagged with a simple expression pattern -- if you think about it, the

compiler has to figure out that a particular "variable" will only

contain the values zero and one, and that it would be very convenient

to register allocate that into the carry bit. That, and there's this

problem (on some machines) with also needing the carry bit to figure

out whether to go around the loop again or not.

Speaking for myself,

David Chase

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.