Death by pointers. (Was: order of argument evaluation in C++, etc.)

johnston@imec.be (Adrian Johnstone)
Wed, 30 Aug 1995 07:28:45 GMT

          From comp.compilers

Related articles
Order of argument evaluation in C++, etc. hbaker@netcom.com (1995-07-08)
Re: Order of argument evaluation in C++, etc. jan@neuroinformatik.ruhr-uni-bochum.de (1995-08-25)
Death by pointers. (Was: order of argument evaluation in C++, etc.) johnston@imec.be (1995-08-30)
Re: Death by pointers. (Was: order of argument evaluation in C++, etc. hbaker@netcom.com (1995-09-05)
Re: Death by pointers. jhallen@world.std.com (1995-09-05)
Re: Death by pointers. whitten@netcom.com (1995-09-05)
Re: Death by pointers. chase@centerline.com (1995-09-06)
Re: Death by pointers. (Was: order of argument gbaker@rp.CSIRO.AU (1995-09-10)
Re: Death by pointers. hbaker@netcom.com (1995-09-11)
[12 later articles]
| List of all articles for this month |

Newsgroups: comp.compilers
From: johnston@imec.be (Adrian Johnstone)
Keywords: C, optimize, comment
Organization: IMEC, Interuniversitair Micro Elektronica Centrum, Belgium
References: 95-07-068 95-08-195
Date: Wed, 30 Aug 1995 07:28:45 GMT

On the sub-issue of alias detection as a problem in argument evaluation:


: Unless I'm very much mistaken, alias identification is an insoluble
: problem, and flagging order dependencies in expressions runs straight into
: that wall.


: Well, I guess it's insoluble in general (no my area of expertise), but it is
: certainly possible and useful. It's clear that this is really what we're
: talking about.


In the C language, alias detection in the presecence of recursive data
structures (i.e. linked lists and up) is an NP-complete problem, which
certainly meets my definition of insoluble ;-). I'm not at home right
now so I can't find the reference but it's derived in someone's PhD
theis. Actually it's not at all surprising if you think about it. I
would rather suppose that this result is true in any pointer-using
language, including Pascal et al.


The thing that makes C a nightmare for optimisation is the
all-pervading nature of pointers along with the widespread use of void
pointers which can implicitly cast all over the place. At least if you
have a constrained pointer model like in Pascal (where pointers are
only allowed onto the heap) you can contain the alias detection
problem to the bits of code that are actually manipulating your linked
list, or whatever. In C, dataflow analysis is essentially impossible,
which for me is very scary. (Life threatening systems programmed with
pointers? Just say no.).


In the not so distant future, all common computers will use
superscalar processors of one form or another, and I predict that the
detailed dataflow analysis required to get the best out of these
machines will mandate a move towards languages which are easier to
analyse, that is with restricted use of pointers and possibly a
complete lack of side effects.


I believe there is a strong parallel here with the rise of structured
programming many years ago. People had to learn to discipline their use
of control flow. This aided humans in their understanding of programs,
and and as a bonus pure structured programs (of which there are few)
yield to simpler control flow analysis algorithms.


The time has surely now come when we need to apply the same process to
data access. After all, what is a pointer but a goto within a data
structure? Actually programming with side effect reduced languages is
not so hard, so why on earth are we making things so difficult for our
optimisers?


The answer, I suspect, lies in the shift from low level sequential
programming as the key to high throughput. When C first came along, it
had some of the attributes of an assembler (especially direct address
modification in all its glory) coupled to structured programming and
data constructs. That made it very attractive for highly-efficient
systems programming, and now it is `common knowledge' that C is the
most efficient high-level programming language around. BUT in the
world of heavily pipelined wacko architectures that need reordering of
your code, giving the programmer access to addresses suddenly doesn't
look like such a good idea because it can create no-go, or even worse
go-with-risk, areas for the optimiser. For these platforms, C code is
less likely to produce well optimised code.


And it's going to get worse. If you think delay slot utilisation is a
pain, try programming a VLIW. It's pretty clear to me that humans
can't cope with VLIW style machines at assembler level, so all one's
preconceptions from assembler programming go out of the window. You
_have_ to use sophisticated (and, please, safe) dataflow analysis to
get much out of these machines.


This is all well known stuff, but maybe not well enough known... The
recent SIGPlan HOPL paper on the history of C contains similar
comments about the pros and cons of pointer arithmetic from one who is
surely not anti-C.


The most pressing question, of course, is what the new, side-effect
reduced, language should be called. A joke circulating in the 1970's
used to have it that nobody knew what the most popular programming language
in the 1990's would look like, but it would be called FORTRAN :-) (a
reference to FORTRAN's history of picking up features from all over
the place). Actually, of course, they were wrong, but nevertheless I
think we can safely say that nobody knows what the programming
langauge of 2010 will look like, but it will be called something
beginning with C.


                                                              Adrian
[Cobol. -John]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.