|Inline block moves email@example.com (1991-11-11)|
|Re: Inline block moves firstname.lastname@example.orgMeyer) (1991-11-11)|
|Inline block moves jfc@ATHENA.MIT.EDU (John Carr) (1991-11-11)|
|Inline block moves jfc@ATHENA.MIT.EDU (John Carr) (1991-11-12)|
|Re: Inline block moves email@example.com (1991-11-12)|
|Re: Inline block moves Bruce.Hoult@actrix.gen.nz (1991-11-12)|
|Re: Inline block moves firstname.lastname@example.org (1991-11-15)|
|Keywords:||architecture, assembler, optimize|
|Date:||Fri, 15 Nov 91 13:14:45 -0500|
Regarding the block copy article:
| The Data General MV8000 is a 16 bit pipelined minicomputer with a physical
| address space of 4 megabytes and a virtual address space of 4 gigabytes
| (newer DG machines, such as the DG 20000, can address up to 64 megabytes).
| Although it is a 16 bit machine, most instructions will work on 32 bit
| data. It has four 32 bit fixed-point accumulators and four 64 bit
| floating-point accumulators. The MV8000 supports 8 bit, 16 bit, and 32
| bit fixed-point arithmetic, as well as 64 bit floating- point arithmetic.
| The MV8000 has block move instructions similar to those of the Z-80 and
| the Intel chips. The code example is shown below.
Umm, the DG MV/8000 was always a 32 bit machine. It runs the 16 bit
Eclipse and Nova instructions as a matter of course, but it is a 32
bit machine. Also, shortly after the initial release, the memory was
bumped to 8M -- I seem to remember that there was some difficulty in
getting 1M drams at initial annoucement...
| Timings for the DG memcpy, shown below, were for 8, 255, 256, 512, and
| 65536 bytes, each executed 1000 times. A time is given for both addresses
| being even, both odd, and one odd/one even. Note the much faster execution
| time for the 8 byte, both even example. This is due to the fact that a
| load doubleword/store doubleword combination was generated for that case.
| Also note the much longer times in the "by the book" row for 255 and 65536
| byte moves. These times were calculated using the DG Principles of
| Operations manual timing of 1.43 microseconds per byte for the wcmv
| instruction, which is referred to as a maximum time.
| The DG timings indicated an unfortunate decision by the DG C compiler
| authors. Note that the timings are fastest for the case of both addresses
| being even, then both odd, with the one even/one odd combination being
| slowest, below 256 bytes. At 256 bytes and above, the compiler generates
| a wblm (wide block move) for the even address case instead of the wcmv
| (wide character move) that it generates for all other cases. As a result,
| the even address moves above 255 bytes are slower than the odd address
| move. The odd/even moves are the slowest in all cases, however, and I have
| no plausible explanation as to why. My first assumption was that as the
| pointers are incremented thru memory, the both odd and both even cases
| will be addressed from an even boundary half the time, whereas the
| odd/even case will always be addressing from one odd numbered location.
| This explains why the odd case is faster than the odd/even case, but it
| does not explain why the even case is faster than the odd case.
Later releases of the common compilers did use the WCMV instructions
instead of WBLM. The people doing the code generators were used to
the 16 bit line, where BLM was MUCH faster than CMV (and on the S-
series processors, CMV didn't exist). In fact, CMV may have been
slower than the naive code of LDB/STB. On the 32 bit machines, the
microcoders spent a lot of time (and microstore) optimizing the WCMV,
but paid scant attention to WBLM, which ran about as fast as doing a
load/store combination. As noted, the optimizations were primarily
targeted to the case where the strings are aligned on dword boundaries
(and where both the source and destination move forward.
Michael Meissner email: email@example.com phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142
Return to the
Search the comp.compilers archives again.