Optimization opportunities unexploited: examples sought

pshuang@athena.mit.edu (Ping Huang)
Mon, 8 May 1995 23:17:05 GMT

From comp.compilers

Related articles
*Optimization opportunities unexploited: examples sought pshuang@athena.mit.edu* (1995-05-08)**
Re: Optimization opportunities unexploited: examples sought glew@ichips.intel.com (1995-06-24)

| List of all articles for this month |

Newsgroups:	comp.compilers,comp.arch.arithmetic
From:	pshuang@athena.mit.edu (Ping Huang)
Followup-To:	poster
Keywords:	optimize, question
Organization:	Massachusetts Institute of Technology
Date:	Mon, 8 May 1995 23:17:05 GMT

[Please reply to me personally via email, rather than posting a
follow-up article in this newsgroup, so as minimize traffic. I will
summarize any informative responses in a few days.]

I am working on a thesis based on the idea is that it is worthwhile to
feed profiler output (from actual runs of the program by the real
end-users using real data on real machines) back to the COMPILER and
not just the human programmer... especially if the human programmer
used annotations in the source code to give hints to the compiler as
to the kinds of more drastic and aggessive program transformations it
can attempt to perform.

I am seeking real-world examples where this would have been useful.
Ideally, I'd like to hear both the application context and actual
(short) snippets of code illustrating the point. (C/C++ code
preferred but not necessary... I'm asking for snippets of code for
concreteness.) Also, if you know of existing run-time systems,
libraries, compilers, etc., which already do this kind of automatic
adaptation in an interesting fashion, please let me know about them.

----------------

This may seem a bit vague, so I've given some (perhaps contrived)
examples below. I don't want to overly bias the kinds of examples I
might get back from people by the choice of these examples --- I'm
interested in hearing about all sorts of cases where if the compiler
could just know how long the resulting program *ACTUALLY* takes to
run, it could make better decisions during a recompile without having
to involve a human programmer at that point in the program lifecycle.
In other words, scenarios where the compiler could have generated
better-performing code if it could have conducted experiments with the
actual compiled code on the destination platform.

  * On large data sets, common for scientific and engineering
      simulations, for example, it's pretty important to size the working
      set of data being accessed at any given time to fit into main
      memory and in the cache. The larger the "blocking size", however,
      the better the performance, at least until the working set gets
      *TOO* large. Profiling data would allow the compiler to
      dynamically experiment with the blocking size and data alignments
      to minimize cache misses and virtual memory thrashing for both
      high-end and low-end machines of a given binary-compatible
      architecture line. While it is possible to try to model such
      effects, the large variety of cache and memory organizations out
      there make analysis difficult.

  * Quicksort (implemented by qsort() in the C library) performs very
      well on average. However, other sorts sometimes perform better; in
      particular, Quicksort performs poorly on almost-sorted arrays (and
      even randomized Quicksort has higher overhead than other sorts for
      almost-sorted arrays). While in this particular instance, it would
      not be difficult to write qsort() to check its input data to see if
      it's already almost sorted and not use Quicksort if that is the
      case, that check would itself be not computationally cheap;
      furthermore, in general, it would not be easy to even see how to
      write a test to decide which implementation would run likely
      faster, much less write such a test which would be computationally
      cheap. A compiler could experiment with linking in such an
      alternative qsort() implementation, and notice that application A
      wins, while application B loses.

  * Performance of programs which use mathematical functions which are
      expensive to compute can often be improved by memoization, where
      the function remembers the results for arguments which it has
      recently been called with. However, memoization isn't free: it has
      space costs (past arguments and results need to be cached somewhere
      in memory) and time costs (every call to the function now checks if
      the result is already known... the more past calls are "memoized",
      the longer this check is likely to take). A compiler could, with
      programmer help, generate versions of functions which do or do not
      do memoization, and use the appropriate one for applications which
      do often call the function with the same arguments vs. applications
      which do not.

  * Cross-platform variance of performance of algorithms. I can't
      come up with an example off the top of my head, but I'd really love
      to see something along this line.

  * Cross-user variance of how programs are used. A very vague example
      might be a novice who uses very simple database queries vs. someone
      who almost always constructs very complex precise queries.

----------------

[Repeat: please reply to me personally via email, rather than posting
a follow-up article in this newsgroup, so as minimize traffic. I will
summarize any informative responses in a few days.]

--
Ping Huang <pshuang@mit.edu>, probably speaking for himself
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Optimization opportunities unexploited: examples sought

pshuang@athena.mit.edu (Ping Huang)Mon, 8 May 1995 23:17:05 GMT

pshuang@athena.mit.edu (Ping Huang)
Mon, 8 May 1995 23:17:05 GMT