|[8 earlier articles]|
|Re: Prediction of local code modifications email@example.com (Max Hailperin) (2008-04-02)|
|Re: Prediction of local code modifications cfc@shell01.TheWorld.com (Chris F Clark) (2008-04-02)|
|Re: Prediction of local code modifications firstname.lastname@example.org (glen herrmannsfeldt) (2008-04-03)|
|Re: Prediction of local code modifications email@example.com (Max Hailperin) (2008-04-03)|
|Re: Prediction of local code modifications firstname.lastname@example.org (Tim Frink) (2008-04-03)|
|Re: Prediction of local code modifications email@example.com (Matthias Blume) (2008-04-04)|
|Re: Prediction of local code modifications firstname.lastname@example.org (George Neuner) (2008-04-04)|
|Re: Prediction of local code modifications cfc@shell01.TheWorld.com (Chris F Clark) (2008-04-04)|
|Re: Prediction of local code modifications cfc@shell01.TheWorld.com (Chris F Clark) (2008-04-05)|
|Re: Prediction of local code modifications email@example.com (Mayan Moudgill) (2008-04-05)|
|Re: Prediction of local code modifications firstname.lastname@example.org (Tim Frink) (2008-04-08)|
|Re: Prediction of local code modifications email@example.com (gneuner) (2008-04-19)|
|From:||George Neuner <firstname.lastname@example.org>|
|Date:||Fri, 04 Apr 2008 18:23:56 -0400|
|References:||08-03-105 08-03-109 08-04-003 08-04-007 08-04-013|
|Posted-Date:||06 Apr 2008 01:17:39 EDT|
On 3 Apr 2008 21:22:40 GMT, Tim Frink <email@example.com> wrote:
Please attribute other posters so we can all follow the conversation.
e.g., Max Hailperin wrote:
>> (1) For each block, make a yes/no decision on whether to move it to
>> the faster memory. The address of each block within its memory (the
>> slow memory or the fast memory) is equal to the sum of the size of the
>> preceding blocks that were assigned to the same memory. That is, the
>> order of the blocks within each memory remains the same as in the
>> original layout and no padding bytes are introduced.
>Sure, the order remains the same. However, the block addresses might
>change due to additional jump instructions that I have to insert to
>preserve correct program flow. Let's say I have this contiguous
>allocation of blocks in memory:
> A B C D
>and b is the implicit successor, i.e. it's on the fall-through edge.
>Further, it should be assumed that all blocks start at an 8byte
>alignment boundary so that no penalties are incurred during fetching.
>Moving B to faster memory, I must add an additional jump instruction
>at the end of block A to direct control flow to B. This addition
>instruction changes to increases the addresses of the following blocks
>C and D ...
As long as the code blocks are in RAM and have a single entry and
exit, you can patch in the jumps dynamically as you relocate the code.
Taking your example, blocks A B C & D and relocating B -> B', you can
overwrite the beginning of B with a jump to B' and append to B' a jump
to C. No address changes are necessary in the unmoved blocks. If you
need to reuse the fast RAM code space, you restore the overwritten
words of B from B' before swapping in new code.
It's been a while since I've done any DSP programming and I don't know
what chip you're using, but if your compiler can generate position
independent code, it will simplify your task greatly.
>> For this problem, dynamic programming is indeed a useful technique.
>> Consider working sequentially through the list of blocks, deciding for
>> each one whether to move it to fast memory or not. The optimal decision
>> on whether to move block k does not depend on the specific choices you
>> made regarding each of blocks 1 through k-1. Instead, it just depends
>> on the total size in bytes of the blocks in the range from 1 through k-1
>> that were selected for fast memory. This total size is what determines
>> all three relevant facts: (1) how many bytes of the fast memory's
>> capacity are still available for blocks k and onward, (2) what the
>> address of block k would be if left in the slow memory, (3) what the
>> address of block k would be if moved to the fast memory.
>Ok, this make sense if I decide "in advance" what blocks will be
>moved. I always assumed a given memory layout where I tries to
>successively move blocks without a "total" view at the entire set of
>However, I still see the problem with the additional jump instructions
>that I have to add. I'm unfamiliar with dynamic programming (yes, I
>will study some literature on this soon :-)), so maybe you can answer
>my question. Can I express in the formulation of the knapsack problem
>the fact that the movement of block N+1 to faster memory will lead to
>an increased code (by the jump) of block N?
>And I'm still not very sure how to integrate my misalignment problem
>into the problem of dynamic programming. Let's say I have tree blocks
>A B C D. I could in advance compute the execution time of each block
>when it was placed in slow memory or in fast memory. The most
>efficient way would be to put either the entire text section of the
>code into the slow or the fast memory to get the two reference values
>for that. However, these values are only true if considered as one
>set. When I decide to move block B to faster memory, I can't guarantee
>for block D (which was decided to stay in slow memory) that it's run
>time is what was decided in advance since it might incur a fetch
>misalignment penalty slowing down its execution below the values that
>was determined when B was still between A and C and D was not
>misaligned. Can such a situation be also modeled with dynamic
Yes. However, the best execution time will naturally be all code in
fast RAM. If your chip can do RAM -> RAM DMA (and you don't otherwise
need the DMA channel), you can arrange a close to ideal situation as
long as the fast RAM can hold each individual code block.
Place two small routines in fast RAM - one to DMA a specified code
block into fast RAM and the other to start execution of the new block
when the DMA is complete. Each code block should call the relocation
routine as early as possible to stage the next block for execution and
end with a call to the restart routine.
If you double buffer your code or if the current block is longer than
the subsequent block, you may be able to time the DMA operation so
that it overlaps execution of the current block and thus not lose any
time waiting for the DMA operation to complete.
Return to the
Search the comp.compilers archives again.