Tools for Measuring Pointer Usage

uknet!knosof!derek@relay.eu.net (Derek M Jones)
Wed, 11 Nov 1992 19:30:30 GMT

          From comp.compilers

Related articles
Tools for Measuring Pointer Usage storm@binkley.cs.mcgill.ca (1992-11-05)
Tools for Measuring Pointer Usage uknet!knosof!derek@relay.eu.net (1992-11-11)
| List of all articles for this month |

Newsgroups: comp.compilers
From: uknet!knosof!derek@relay.eu.net (Derek M Jones)
Organization: Compilers Central
Date: Wed, 11 Nov 1992 19:30:30 GMT
Keywords: optimize, tools
References: 92-11-024

> We are about to undertake on a project to study pointer usage in
>computer programs, and were wondering if anybody has written a tool that
>will tell you what percentage of loads are, in fact, just getting addresses
>for pointer dereferences...


What you need is a copy of the Model Implementation C Checker. This is a
tool that checks C source for strict conformance to the C Standard. It
was designed to be 'provably' correct (what ever that means). We did lots
of stuff to show that the front end correctly implemented the C standard.
To make life simple the code generator targeted a tagged stack machine
(the stacks were tagged not memory). This also means that the code is
interpreted.


Who is interested in having source conform to standards you say. Well the
tool is ised by Perennial to check the C validation suite used by NIST and
by BSI on the Euoropean C validation suite. Anyway... This summer we did
some work to speed up our runtime system. So here are some dynamic
statistics (interpreting Perl doing something or other).


The base code generator does no optimisation (not even constant folding).
We were interested in pairs as a guide to doing simple peephole
optimisations.


The top 40 pairs of opcodes (41179525 total opcodes):
load_param_ptr ->load_indexed : 970499
zero_jump ->load_param_ptr : 748316
quick_scale_int ->add_pointer : 701146
zero_jump ->load_local_ptr : 683967
push ->call : 632980
load_local_ptr ->load_indirect : 619034
local_address ->safe_post_plus : 560604
less_than ->zero_jump : 553110
load_local_ptr ->load_local_ptr : 523952
load_local_ptr ->load_indexed : 507239
load_indirect ->store_local : 464335
load_local_int ->quick_scale_int : 457542
uncond_jump ->zero_jump : 455324
bit_and ->zero_jump : 450873
store_local ->load_param_ptr : 437924
store_local ->load_local_ptr : 431413
load_param_ptr ->store_indexed : 430964
load_param_ptr ->load_indirect : 430212
add_pointer ->load_indirect : 424272
zero_jump ->uncond_jump : 404367
return ->drop : 400896
load_indexed ->store_local : 397282
zero_jump ->load_const_zero : 341959
less_equal ->zero_jump : 337475
zero_jump ->load_local_int : 331245
uncond_jump ->local_address : 316436
equal_equal ->zero_jump : 316229
add_pointer ->store_indirect : 305281
log_not ->zero_jump : 301284
zero_jump ->local_address : 298487
load_const_byte ->bit_and : 297425
load_global_ptr ->load_indirect : 288925
load_param_ptr ->load_local_int : 284580
load_local_int ->load_local_int : 282957
load_param_ptr ->push : 281057
uncond_jump ->load_local_ptr : 278720
zero_jump ->load_global_ptr : 270147
uncond_jump ->switch : 257355
store_local ->uncond_jump : 253755
push ->load_local_ptr : 253062


The interepeter does pointer checking. This is quiet a large overhead.
So we also looked at pairs of address related op-codes with a view
to combining them.


Combinations of pointer opcodes
local_address ->mov : 92160
local_address ->local_address : 52782
local_address ->load_indirect : 158858
local_address ->store_indirect : 50993
local_address ->safe_pre_plus : 53109
local_address ->safe_post_plus : 560604
global_address ->inc_address : 90007
load_indirect ->global_address : 97673
load_indirect ->add_pointer : 168574
load_indirect ->load_indexed : 148494
param_address ->safe_pre_plus : 67962
param_address ->safe_post_plus : 71944
add_pointer ->local_address : 92160
add_pointer ->load_indirect : 424272
add_pointer ->store_indirect : 305281
add_pointer ->load_indexed : 184788
load_indexed ->add_pointer : 119204
load_indexed ->load_indexed : 56510
safe_post_plus ->load_indirect : 77753
safe_post_plus ->store_indirect : 102339


The following is simple op-code frequency (not broken down by datatype).


Opcode frequencies:
pre_plus 12182 pre_minus 827 post_plus 7263
post_minus 4171 plus_equal 119403 minus_equal 0
plus_ptr_equal 140862 sub_ptr_equal 8996 times_equal 97673
div_equal 0 rem_equal 0 lshift_equal 3
rshift_equal 423 and_equal 2064 or_equal 23626
xor_equal 15 plus 236858 add_pointer 1159187
minus 37753 sub_pointer 96899 times 135045
divide 71 remainder 0 left_shift 8752
right_shift 8200 bit_and 610544 bit_or 8928
bit_xor 0 negate 145911 not 15627
value_cast 372140 inc_address 276505 scale 142
equal_equal 424931 greater_equal 48161 greater_than 249625
less_equal 342402 less_than 667662 not_equal 380710
log_not 439921 call 633928 call_system 6487
call_indirect 0 uncond_jump 2065153 zero_jump 3671377
nzero_jump 148630 return 416241 mov 123098
switch 257355 load_local 65789 load_global 192506
load_constant 508577 null_pointer 146015 load_literal 11930
literal_address 9378 local_address 1405288 global_address 454838
function_address 2 load_indirect 2089911 load_ind_keep_addr 0
mkaddr 0 push 1312262 load_bit_field 0
store_local 1885912 store_global 338854 store_indirect 557051
store_bit_field 0 begin_block 6 end_block 6
drop 632177 drop_value 416641 duplicate_address 0
object_length 0 load_param 301216 store_param 50793
param_address 155607 nop 0 load_indexed 2075849
store_indexed 561178 load_const_zero 801523 load_const_one 619120
load_const_byte 750902 load_idx_global 22 store_idx_global 6378
safe_pre_plus 141392 safe_post_plus 634040 load_local_ptr 3231256
load_local_int 2414312 load_param_ptr 3320694 load_param_int 623903
load_global_ptr 780926 safe_pre_minus 26637 safe_post_minus 32813
quick_scale_int 758509 jump_zero_ptr 410987 jump_nzero_ptr 48574
short_load_local 0 short_load_param 0 short_load_global 0
addr_idx_global 0


Here the frequency is broken down by datatype (sch - signed char, etc).


Opcode/Type pairs:
                                        sch uch ssh ush sint uint slng ulng dbl ldbl ptr fptr agg void
pre_plus : 1 0 0 0 5933 0 6248 0 0 0 0 0 0 0
pre_minus : 0 0 0 0 0 0 827 0 0 0 0 0 0 0
post_plus : 0 0 0 185 4100 0 2978 0 0 0 0 0 0 0
post_minus : 0 0 5 0 4166 0 0 0 0 0 0 0 0 0
plus_equal : 0 0 1890 0 197360 35438 468 6 3644 0 0 0 0 0
plus_ptr_equal : 0 0 0 0 136495 1154 3213 0 0 0 0 0 0 0
sub_ptr_equal : 0 0 0 0 4579 0 4417 0 0 0 0 0 0 0
times_equal : 0 0 0 0 195346 0 0 0 0 0 0 0 0 0
lshift_equal : 0 0 0 0 0 0 0 3 0 0 0 0 0 0
rshift_equal : 0 0 0 0 0 423 0 0 0 0 0 0 0 0
and_equal : 1999 0 19 8 2102 0 0 0 0 0 0 0 0 0
or_equal :19121 0 382 551 27198 0 0 0 0 0 0 0 0 0
xor_equal : 0 0 15 0 15 0 0 0 0 0 0 0 0 0
plus : 0 0 0 0 136372 92811 7674 0 1 0 0 0 0 0
add_pointer : 0 0 0 0 1043746 93951 11874 9616 0 0 0 0 0 0
minus : 0 0 0 0 31675 6078 0 0 0 0 0 0 0 0
sub_pointer : 0 0 0 0 11476 0 0 0 0 0 85423 0 0 0
times : 0 0 0 0 101101 33871 0 73 0 0 0 0 0 0
divide : 0 0 0 0 67 0 4 0 0 0 0 0 0 0
left_shift : 0 0 0 0 8752 0 0 0 0 0 0 0 0 0
right_shift : 0 0 0 0 8200 0 0 0 0 0 0 0 0 0
bit_and : 0 0 0 0 607360 846 2338 0 0 0 0 0 0 0
bit_or : 0 0 0 0 8928 0 0 0 0 0 0 0 0 0
negate : 0 0 0 0 145911 0 0 0 0 0 0 0 0 0
not : 0 0 0 0 15627 0 0 0 0 0 0 0 0 0
scale : 0 0 0 0 0 0 0 142 0 0 0 0 0 0
equal_equal : 0 0 0 0 181881 24935 0 0 0 0 218115 0 0 0
greater_equal : 0 0 0 0 28081 0 6891 0 0 0 13189 0 0 0
greater_than : 0 0 0 0 192692 23603 30379 0 1822 0 1129 0 0 0
less_equal : 0 0 0 0 160285 86234 2 0 0 0 95881 0 0 0
less_than : 0 0 0 0 256399 78555 5408 9616 0 0 317684 0 0 0
not_equal : 0 0 0 0 310914 6360 23 0 7640 0 55771 2 0 0
log_not : 0 0 0 0 230098 2506 6334 0 0 0 200983 0 0 0
return :23299 0 0 0 112210 713 0 0 2373 0 136479 0 0 141167
switch : 0 0 0 0 257355 0 0 0 0 0 0 0 0 0
load_local : 3799 0 30706 0 0 11841 4 9619 9703 0 0 117 0 0
load_global : 7174 0 823 1050 163751 0 19708 0 0 0 0 0 0 0
load_constant : 0 0 415 293 189364 287860 20947 9698 0 0 0 0 0 0
load_literal : 0 0 0 0 0 0 0 0 11930 0 0 0 0 0
load_indirect :726838 169513 21400 253 109684 0 1785 0 0 0 1060438 0 0 0
push : 0 0 0 0 246991 176091 22250 0 10164 0 856500 236 30 0
store_local :67901 0 14266 0 895912 4011 2 9 9703 0 893991 117 0 0
store_global :12972 2 19 616 16552 0 1963 0 0 1 306727 2 0 0
store_indirect :153195 8004 2736 5 145684 0 1785 0 0 0 245642 0 0 0
drop_value :15584 0 0 349 114274 0 0 0 0 0 286434 0 0 0
load_param : 670 0 0 38 7297 276558 0 0 10164 0 6489 0 0 0
store_param : 1227 0 0 76 45889 78 0 0 0 0 3523 0 0 0
load_indexed :440831 108034 148769 256415 198070 374672 5886 0 18195 0 524975 2 0 0
store_indexed :229213 88947 1073 1911 9465 112723 5968 0 10753 15 101110 0 0 0
load_idx_global : 0 0 0 22 0 0 0 0 0 0 0 0 0 0
store_idx_global : 328 520 4855 270 8 10 19 2 5 0 361 0 0 0
safe_pre_plus : 0 0 0 0 103733 0 0 9616 0 0 28043 0 0 0
safe_post_plus : 0 0 0 0 217297 0 534 0 0 0 416209 0 0 0
safe_pre_minus : 0 0 0 0 20671 0 0 0 0 0 5966 0 0 0
safe_post_minus : 0 0 0 0 32657 0 0 0 0 0 156 0 0 0


Cast type pairs:
To/From sch uch ssh ush sint uint slng ulng dbl ptr
sch : 0 0 0 0 144287 0 20 0 0 0
uch : 0 0 0 0 8193 10 0 0 0 0
ssh : 22 0 0 0 5343 2 0 0 0 0
ush : 0 0 0 0 1930 0 0 0 0 0
sint : 21120 0 2306 559 0 21216 18787 0 3610 0
uint : 86234 44 0 253 60510 0 5210 73 0 0
slng : 0 0 0 0 34982 2 0 0 0 206
ulng : 0 0 0 0 3 0 73 0 0 0
dbl : 590 0 1 0 4389 0 0 3 0 0
ldbl : 0 0 0 0 16 0 0 0 0 0
fptr : 0 0 0 0 119 0 0 0 0 2


We noticed quiet a range of results. The more sophisticated programmers
(anybody who uses datatypes other than int) gave very different results.
Pointer usage depended on the type and style of application (lots of
access' to global data structures, storing results via addresses passed as
parameters vs global objects).


>From: dk <dk@farm.cs.kiev.ua>
>Are there any free ANSI C compiler implementation (source) which generates
>some explicit intermediate code for simple stack machine? (or can be
>easily tailored to do that).


It's not free. We sell source and executables.


>No code generation besides that is required, nor _any_ optimizations
>(besides may be simple constant pasting in expression trees).


We are the only compiler that I know of that does no optimisation what so
ever (we do fold constants to check various semantic conditions, but the
unfolded expressions are written out {so we actually have to do work to do
no optimisations}).


>the compiler is about to be used for embedding into applications (so size
>is much more important than speed).


The work done over the summer on optimisation reduced the code size quiet
a bit (tagged instructions are a bit memory hungry). The interpreter
currently runs approx 50 times slower than compiled code (depending on
pointer usage {pointer checking is expensive}).


derek
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.