Re: Language used to write compilers

nmm1@cus.cam.ac.uk (Nick Maclaren)
31 Dec 2004 13:03:21 -0500

From comp.compilers

Related articles
[4 earlier articles]
Re: Language used to write compilers vbdis@aol.com (2004-12-30)
Re: Language used to write compilers dido@imperium.ph (Rafael 'Dido' Sevilla) (2004-12-30)
Re: Language used to write compilers nick.roberts@acm.org (Nick Roberts) (2004-12-30)
Re: Language used to write compilers samiam@moorecad.com (Scott Moore) (2004-12-31)
Re: Language used to write compilers Martin.Ward@durham.ac.uk (Martin Ward) (2004-12-31)
Re: Language used to write compilers nmm1@cus.cam.ac.uk (2004-12-31)
*Re: Language used to write compilers nmm1@cus.cam.ac.uk* (2004-12-31)**
Re: Language used to write compilers idbaxter@semdesigns.com (Ira Baxter) (2004-12-31)
Re: Language used to write compilers Peter_Flass@Yahoo.com (Peter Flass) (2005-01-01)
Re: Language used to write compilers nmm1@cus.cam.ac.uk (2005-01-03)
Re: Language used to write compilers napi@cs.indiana.edu (2005-01-03)
Re: Language used to write compilers vbdis@aol.com (2005-01-09)
Re: Language used to write compilers nmm1@cus.cam.ac.uk (2005-01-12)

| List of all articles for this month |

From:	nmm1@cus.cam.ac.uk (Nick Maclaren)
Newsgroups:	comp.compilers
Date:	31 Dec 2004 13:03:21 -0500
Organization:	University of Cambridge, England
References:	04-12-148 04-12-158 04-12-169
Keywords:	practice
Posted-Date:	31 Dec 2004 13:03:21 EST

Rafael 'Dido' Sevilla <dido@imperium.ph> wrote:
>On Thu, Dec 30, 2004 at 12:58:39AM -0500, Nick Maclaren wrote:
>> Hmm. Firstly, I would say "a little easier" - compilers are pretty
>> simple programs in terms of memory management.
>
>Until you consider that compilers generally do plenty of string
>handling, and that in C strings are second-class entities that you
>must manually allocate and deallocate whenever needed. There are also
>a lot of other complex dynamic data structures that must be maintained
>by any compiler of serious scope, such as syntax trees, the various
>graph structures used for code optimization, and so forth. I would
>hardly call programs that used so many dynamic data structures "simple
>programs in terms of memory management".

I would. To address your points in turn:

Compilers do very little string handling, because they typically just
read in strings, and either keep them or turn them into a token or
pointer. They do very little dynamic creation and deletion, and the
total "string space" is usually small. If you think that C is bad,
try Fortran 66 - and string handling was not the main problem when
writing compilers in that :-)

The handling of syntax trees during optimisation is probably the only
aspect of a compiler that stresses memory management, which is why
almost all memory problems with compilers occur during that phase (and
why a common solution is to disable optimisation). But even it is not
hard, compared with many other applications.

The reason is that trees are a simple data structure. There are a few
more complex data structures in compilers (e.g. when attempting to
share code or data), but generally things are pretty simple. In
particular, it is rare for scoping to be the nightmare that it can be
in some applications.

For example, consider dynamic remeshing on a distributed memory
parallel machine with, of course, something like an electromagnetic
model to solve over it all.

>> Secondly, and more importantly, ignoring those aspects leads to the
>> sort of abominable compilers that we are inflicted with today.
>
>These aspects are not ignored. Automatic garbage collection and
>memory management leaves these aspects in the hands of the machine,
>which can do a reasonable job, leaving the programmer to concentrate
>on the actual problem, not extra bookkeeping.

I have been trying to kill that myth for 30 years, but I shall go to
my grave defeated :-(

No, that is not so. This being comp.compilers, here is one very
simple and very common problem that has been a nightmare for users for
40 years and probably more:

        When a program goes haywire, such as being faced with input that
is exponentially hard to handle or because the compiler author failed
to delete a now-unneeded pointer in a recursion, the symptoms often
include memory leaks.

        With explicit memory allocation, the programmer usually knows how
much allocation is reasonable, and can put in a check and diagnostic.
With automatic schemes, there are rarely suitable controls, and the
program runs until it exhausts resources.

        There are only sometimes tools for the programmer to diagnose such
failures (see below), and almost never tools for the user of the
program (or his local expert, i.e. me) to do so. No, in real life,
those people CANNOT change the source of the compiler. We are then
typically faced with a resource exhaustion failure and no useful
diagnostics - and sometimes even a denial of service :-(

>Perhaps it is because these aspects are truly ignored, implemented
>under systems that don't support automating the process, that you
>have the abominable compilers you describe.

I know of no current operating systems that support control of the
virtual, physical and cache memory, TLB usage, I/O rate and so on. A
few do SOME of them, but not all - and you only have to have a
compiler run a system out of ONE critical resource to start causing
other users' jobs (or even the whole system) to fail.

Even for resources where the operating system does allow control, it
is very rare for this to be done competently either by the authors of
the automatic garbage collection system or by the authors of the
compiler. The reason is that such automation adds an extra level of
difficulty to diagnosis.

What is needed is that the compiler detects such problems, and relates
the resource exhaustion to the INPUT PROGRAM and what it being done to
it (including the options). Relating it to the source of the compiler
is only a half-solution - but we rarely get even that :-(

Please note that explicit resource allocation does not solve any of
the above. The ONLY difference is that it does not introduce a level
of obfuscation between the parts of the program causing the problem
and the parts that are in a position to do something about it.

Regards,
Nick Maclaren.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Language used to write compilers

nmm1@cus.cam.ac.uk (Nick Maclaren)31 Dec 2004 13:03:21 -0500

nmm1@cus.cam.ac.uk (Nick Maclaren)
31 Dec 2004 13:03:21 -0500