|Generating optimal code from a DAG email@example.com (Michael Weitzel) (2005-04-11)|
|Re: Generating optimal code from a DAG firstname.lastname@example.org (TOUATI Sid) (2005-04-16)|
|Re: Generating optimal code from a DAG email@example.com (2005-04-16)|
|Date:||16 Apr 2005 11:13:01 -0400|
|Posted-Date:||16 Apr 2005 11:13:01 EDT|
Michael Weitzel wrote:
> I am currently working on a code generator for the Intel x86 FPU
> (x>=3) which generates code for single expressions which can be very
> large and highly redundant (in terms of common subexpressions). My
> code generator creates a DAG from the expression tree to eliminate
> common subexpressions. The DAG is split into trees and code is
> generated for common subtrees (using the algorithm from Bruno and
> Lasagne to obtain optimal stack-machine-code). After that, some
> peephole optimization replaces (only) pairs of opcodes by single
> opcodes using the Intel's opcodes.
I don't understand why you split the DAG again. You can translate the
DAG directly into RISC-like intermediate language in one DFS pass. You
can replace pairs of opcodes (multiply-add?) even in the same pass.
Recall that DAGs don't naturally translate into stack-machine
> I understand that the generated code is not optimal:
> - peephole optimization frees up to one level of the stack that
> be used to generate slightly better code for the sub-trees
> - results of sub-tree computations could be stored in the FPU's
> (the Intel FPU can access every level of the stack at any time) -
> instead I always save them to local variables (in the function's
> stack frame).
You can use register allocation of FP regs to avoid storing some
values. The "stack" of x87 is a nightmare for a compiler writer. Very
fortunately, you can change the register numbers of these regs with fp
exchange instruction, and these instruction is very cheap and can be
executed in parallel with other fp instructions, even on old Pentiums.
I think that on 387 (and 486?) this can be false.
On a very old processor you might want to do save/restore for the
values of the common subexpressions or keep these values on the stack,
and use the stack machine code.
> - the FPU-stack is not working to capacity when switching from one
> sub-tree to the next and the computation of a sub-tree always
> on an empty stack.
I always thought that avoiding the last pop in case of a common
subexpression you'll get as a start point the stack that contains the
value of this expression. This value can be reused later.
> - visiting the DAG nodes in topological order leaves some option to
> delay or prefer code generation for nodes on the same topological
> In the texts I read, these problems (and probably others I forgot)
> usually summarized (or "slayed"?) in a sentence like "Code generation
> from DAG is NP-complete.". ;-)
It's a common mistake. Optimal DAG scheduling cannot be NP complete,
because it is not a decision problem. Most of the texts say that
scheduling is NP hard.
> I haven't seen any proof or exact description of the problem yet.
> makes code generation from DAG NP-complete? I have some ideas -- but
> does an optimal algorithm work?
Well, an optimal algorithm for an NP-hard problem should meanwhile
(until P=NP is proved) test all the possibilities, e.g. all the
possible rearrangings of instructions that don't change the computation
result, and choose the fastest one.
In almost any compiler text you can find a couple of lines about
scheduling a basic block consisting of the DAG of data-dependent
> Are there better algorithms than splitting a DAG into trees? Perhaps
> can afford using some NP-complete procedure on parts of a DAG
Optimal scheduling was done for small basic blocks. You usually don't
try to do it for very large blocks (100s or 1000s instructions).
Return to the
Search the comp.compilers archives again.