Related articles |
---|
Block reording. Edward.Rosten@gmail.com (2007-03-08) |
Re: Block reordering to minimize length of jump instructions bill@qswtools.com (Bill Cox) (2007-03-10) |
Re: Block reording. Edward.Rosten@gmail.com (2007-03-10) |
Re: Block reordering. nupul.kukreja@gmail.com (nupul) (2007-03-14) |
From: | Edward.Rosten@gmail.com |
Newsgroups: | comp.compilers |
Date: | 8 Mar 2007 19:52:58 -0500 |
Organization: | Compilers Central |
Keywords: | optimize, question |
Posted-Date: | 08 Mar 2007 19:52:58 EST |
I'm new to compiler optimization, so can anyone tell me where to look
about this?
I have some code structured roughly like this:
for(int i=0; i < N; i++)
{
if( cond1a)
if(cond 2a)
goto stuff;
else if(cond2b)
continue;
else
continue
else if(bond 1b)
if(cond3a)
continue;
else if(cond3b)
goto stuff;
else
continue;
else
continue;
stuff:
//do something here
}
Condition condXa and condXb work in the same data point, the if-else
tree above is only an example, it's actually a general tree, and the
continue and gotos are distributed round the leaves (though an else
condition can't have a goto, just a continue).
The code isn't C, it's only ever present as a tree. The problem comes
when generating the machine code for this (x86 machine code, as it
happens). The code looks like repeated blocks of:
mov XX, %eax //Fetch some data
cmp %ebx, %eax //Conition Xa
jg ##### //goto ###
cmp %ecx, %eax //Condition Xb
jg ##### //got ###
// <--- else block follows directly
The code is simple to generate if I only use long distance jumps for
je, since then every block is the same size and every jump is more or
less equal in cost. However, a jump to an arbitrary point is 6 bytes
of machine code, where as a jump to a nearby point can be sone in
only 2 bytes. Naturally, I'd rather use as many short jumps as
possible, but this requires some strategy to reorder the blocks and
switching from a 6 byte jump to a 2 byte has the potential to affect
the distances of a large number of the other jumps. And the shrinking
of the code size may allow other short jumps to be used, etc...
The final point it that the optimization needs to be reasonable
efficient since the compiled code currently takes about 10ms to
execute. There's usually around 1000 blocks, but it can be up to
10,000.
Can anyone gove me some pointers about this?
Thanks
-Ed
Return to the
comp.compilers page.
Search the
comp.compilers archives again.