Re: Space optimizations of globals and statics

austin@cs.wisc.edu (Todd Austin)
Thu, 11 May 1995 02:59:24 GMT

          From comp.compilers

Related articles
Space optimizations of globals and statics PATTEEUW@etcv01.eld.ford.com (1995-05-03)
Re: Space optimizations of globals and statics austin@cs.wisc.edu (1995-05-11)
| List of all articles for this month |
Newsgroups: comp.compilers
From: austin@cs.wisc.edu (Todd Austin)
Keywords: storage, design
Organization: University of WI, Madison -- Computer Sciences Dept.
References: 95-05-038
Date: Thu, 11 May 1995 02:59:24 GMT

PATTEEUW@etcv01.eld.ford.com writes:
|> When building embedded applications, optimization of space (eliminating
|> unused memory between global and or static variables/aggregates by
|> reordering them) is obviously very important. One compiler vendor I have
|> spoken with told me that they don't do this because the source code for
|> certain "UNIX utilities" actually makes assumption about the order of
|> variables. (I hope this is an old practice and that no one programs in
|> this manner any more.)
|>
|> My question is, what do other compilers do (ie. do they reorder for space
|> optimization or do they leave the order alone) ?
|>
|> Jack Patteeuw


As part of my dissertation research I've developed a linker (KLINK) that is
capable of arbitrarily reordering static variables. It can also increase
the amount of static space by moving the stack locals of non-recursive
procedures into the global data section. The linker is implemented as a
post-pass phase in GNU GLD. Perhaps I can shed a little insight into this
optimization.


Most existing object file formats will not accomodate arbitrary reordering
of static variables. There are a number of reasons for this. First, there
is no complete record of the start and ending address of each variable, in
fact, many global section variables are not even mentioned in the symbol
table. To reduce the size of object files, any variable that is not
externally referenced, (e.g., local labels, static variables, literal)
have no symbolic information in the object file. The symbols that do
exist only show the starting address of a variable.


Second, to move a variable all references to it need to be fixed. At first
inspection, this does not seem like a problem because object files contain
relocation records which describe each of these references. However, to
again reduce symbol space requirements, relocations are typically expressed
as a reference to a location relative to the start of a segment symbol.
It's insufficient to infer the variable referenced by comparing the location
of the relocation reference address to the bounds of variables in the
referenced segment. This is because compilers often generate address
constants outside of the address bounds of the referenced variable (e.g.,
yacc output contains many references in the form &a[-1], a negative index
constant will often be combined with the start address of an array during
loop strength reduction).


Third, if the reordering mixes initialized and uninitialized variables, the
object file format may force initialization of some variables, which can
greatly increase the size of executables.


Solve all these problems (we did by judiciously hacking the compiler,
assembler, and linker), and building a data-relocating linker is relatively
simple. BTW, I have not yet encountered a C or FORTRAN code that assumed
any ordering of globals other than those in structure or COMMON sections.
Assembly, on the other hand, is much more problematic. The MIPS Ultrix libc
implementation of atod() contains an array that is labeled in the middle!


I've seen typically 2-15% decreases in the space requirements of programs
after using a bin packing algorithm to pack their data segments.
Incidentally, as part of my research I've packed variables with alignment
rounded up the the next power-of-two larger than the size of the variable
bounded by 16k -- even with this restriction I was able to pack the globals
of GCC in 1k less space than the default global data size.


A naive packing of variables in the data segment usually makes the program
run slower due to increased number of data cache misses. I've seen miss
rates increase by as much as a factor of six after packing the data segment.
I believe this is largely due to a reduction in spatial locality in the
packed data segment. Programmers tend to put logically related variables in
spatially local locations (e.g., same file, subsequent lines), and not
surprisingly, these variable exhibit a lot of spatial and temporal locality
when referenced.


A smart packing, however, has many benefits including reduced space
requirements.


Regards,
-Todd Austin


--
%% Todd Austin, austin@cs.wisc.edu, http://www.cs.wisc.edu/~austin/austin.html
%% Department of Computer Sciences, University of Wisconsin -- Madison
%% 1210 West Dayton, Madison, WI 53706
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.