Sun sparc behavior

Andre.Marien@cs.kuleuven.ac.be (Andre Marien)
Tue, 22 Dec 1992 11:00:50 GMT

          From comp.compilers

Related articles
Sun sparc behavior Andre.Marien@cs.kuleuven.ac.be (1992-12-22)
| List of all articles for this month |

Newsgroups: comp.compilers,comp.arch
From: Andre.Marien@cs.kuleuven.ac.be (Andre Marien)
Organization: Dept. Computerwetenschappen K.U.Leuven
Date: Tue, 22 Dec 1992 11:00:50 GMT
Keywords: sparc, architecture

While trying to get some ideas for optimization, we run into some oddities
which we cannot explain. We have no solid background in architecture, but
would like to have some explanation for the observed behavior. Our
test frame should have all data and instructions in the cache. (compare
loop with NN nops with loop where N nops are replaced).


First, any store takes 5 cycles on a sparc 1, and 7 cycles on a sparc 2
server. Is this number related to the pipeline depth ? Why is it that
stores take at least 5 cycles, even if the surrounding code is just
no-ops?


Our guesses:
1) the pipeline is emptied for some reason.
2) The cache can only deliver on datum at the time, so it cannot deliver
      both an instruction and data.
      As reading takes 2 cycles on all sparcs tested, it is not unreasonable
      for us for the store to take 3 cycles. But 5 ??


Second, two consecutive stores add 2 cycles. Two consecutive loads add no
cycle. This seems to imply that despite the fixed cost, the write
buffer(s) is not available in time. However, three stores after each
ather only incur these 2 extra cycles once !? On the sparc 2 server, it
looks as if the distance between two stores should be odd to avoid those 2
cycles. We think it may have to do with the pipeline organization, but
have no clue.


Third, on the sparc 2 server, we notice that a load + store takes 7
cycles, not 9, independent of whether we store to the location of the load
or not. The same is true for the combination store + load. Putting some
noops between the load/store or store/load increases the number of cycles
just by that number.How can an independent load make a store go faster ??


If any kind soul can give some feed-back or references, we would be happy.
We know we don't need the architecture definition, but some docs on the
implementation. Some clues we obviously missed would already be very nice.


Andre' Marien
bimandre@cs.kuleuven.ac.be
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.