Re: Compiler writers will love this language

torbenm@diku.dk (Torben Ęgidius Mogensen)
20 Jun 2003 00:10:40 -0400

          From comp.compilers

Related articles
[6 earlier articles]
Re: Compiler writers will love this language vbdis@aol.com (2003-06-08)
Re: Compiler writers will love this language genew@mail.ocis.net (2003-06-08)
Re: Compiler writers will love this language marcov@toad.stack.nl (Marco van de Voort) (2003-06-20)
Re: Compiler writers will love this language ericmuttta@email.com (2003-06-20)
Re: Compiler writers will love this language ericmuttta@email.com (2003-06-20)
Re: Compiler writers will love this language ericmuttta@email.com (2003-06-20)
Re: Compiler writers will love this language torbenm@diku.dk (2003-06-20)
Re: Compiler writers will love this language d00-mla@nada.kth.se (Mikael 'Zayenz' Lagerkvist) (2003-06-22)
Re: Compiler writers will love this language mwotton@cse.unsw.edu.au (2003-06-22)
Re: Compiler writers will love this language dot@dotat.at (Tony Finch) (2003-06-25)
Re: Compiler writers will love this language lex@cc.gatech.edu (Lex Spoon) (2003-06-25)
Re: Compiler writers will love this language firefly@diku.dk (Peter \Firefly\Lund) (2003-06-25)
Re: Compiler writers will love this language joachim.durchholz@web.de (Joachim Durchholz) (2003-07-02)
[9 later articles]
| List of all articles for this month |

From: torbenm@diku.dk (Torben Ęgidius Mogensen)
Newsgroups: comp.compilers
Date: 20 Jun 2003 00:10:40 -0400
Organization: Department of Computer Science, University of Copenhagen
References: 03-05-211 03-06-015 03-06-054
Keywords: optimize
Posted-Date: 20 Jun 2003 00:10:40 EDT

ericmuttta@email.com (Eric) writes:


> torbenm@diku.dk (Torben Ęgidius Mogensen) wrote:


> > Don't mix unspecified evaluation order with unrestricted side-effects,
> > such that the result of a computation may depend on the (unknown)
> > evaluation order.
>
> I have often wondered how specifying evaluation order impacts
> implementation. The arguments I have heard for it are:
>
> - it allows consistency between different compilers hence allowing
> code portability between compilers
>
> - it allows one to take advantage of evaluation order in expressions
>
> The second one sounds dubious at first but thinking about it, we
> always rely on our code executing from top to bottom and write the
> code so its dependant on that execution order. So then, is it
> unresonable to specify that evaluation order is left-to-right (or
> vice-versa)? If so, what are the arguments for leaving the order
> undefined? (I have heard something related to efficiency and register
> allocation during procedure invocations, but no explanation was
> given).


The advantage of leaving the order unspecified (when the semantics are
unaffected by the evealuation order) is that the compiler can optimize
the evaluation order. This can (as you say) allow the compiler to use
fewer registers or schedule instructions more efficiently,.


On a modest scale, unspecified evaluation order can apply to small
local side-effect-free expressions or it can extend so data-dependence
is the only things that determines evaluation order. The latter
allows even function calls to be executed in arbitrary order, or even
interleaved or in parallel. Haskell is an example of a language with
this feature.


> > If you have pointers with less-than comparison and arithmetic
> > (adding offsets to pointers or subtract pointers to find a
> > distance), make it illegal (or at least undefined) to
> > compare/subtract two pointers that are not derived on the same base
> > pointer (i.e, pointers into the same array or struct).
>
> I think illegal is more like it. If that were allowed, won't the result
> *always* point to somewhere that's statically unknown and hence
> impossible to prove as legally/safely addressable?


That is more or less the point. The C standard states that the
behaviour of the above is unspecified, but neither the compiler nor
the runtime system flags occurences of this. Hence, many C programs
rely on a specific compilers behaviour in these cases. Ideally,
potential occurrences of the behaviour should be caught at
compile-time, but in practice some cases must be left for runtime
checking.


> http://www.dcs.ed.ac.uk/home/stg/NOTES/node31.html
>
> says that higher order functions are "Functions which take functions
> as an argument...". Is that similar to function pointers in C/C++
> where functions are called by address instead of by name?. On the same
> vein, are closures related to this in any way?


Functions in C/C++ are all declared globally, so a function parameter
is only a code pointer. If you have locally declared function (as in
Pascal), a function parameter needs also to include an environment
pointer. This pair is called a "thunk" or "closure". In Pascal,
functions can't be returned out of their dynamic scope, so thunks can
be stack allocated. In true functional languages, there is no limit
to where functional values can be passed, returned or stored, so
thunks sometimes need to be heap allocated. Many implementations
always heap-allocated thunks, so they don't have to do global analysis
to figure out if it is safe to stack-allocate.


> > > * keep in mind that ____ is harder to implement than ____ but they do
> > > the same/similar thing
> >
> > OO with dynamic dispatch is harder to implement than polymorphism
> > combined with type-classes (as in Haskell), but the latter is just as
> > useful (maybe more).
>
> OOP terms seem to have loads of meanings nowadays. When I hear about
> "dynamic dispatch" in OOP, I think about how polymorphic invocations
> are implemented by using tables of function addresses (eg C++ virtual
> tables). It also makes me think about something I have heard being
> called "multi-methods". It involves function overloading with
> overload resolution occuring at run-time, based on the actual type of
> all arguments to the function (this feature strikes me as being harder
> to implement than static overload resolution, or atleast harder to
> implement efficiently.) Are any of what I have mentioned, what you
> mean when you speak of "OO with dynamic dispatch"?


Dynamic dispatch is when the actual method called isn't known until
run-time, so it has to be done via a virtual table (or similar
mechanism). This is easy and fairly efficient for single inheritance,
but (without whole-program analysis) it isn't for mutual inheritance.
If the method can be determined at compile-time, there is no problem,
though.


> > > * stay away from ____ its not worth the trouble unless you need ____
> >
> > Mutual inheritance (unless static).


> > Mutual inheritance (unless static).
>
> Again, this is a new one to me.


I meant _multiple_ inheritance. Sorry about the slip.


Static means that the methods can be identified at compile-time (see
above).


> > > * look at the grammar for language _____ , its easier/harder to
> > > implement because it does/doesn't have _____
> >
> > Symbol-table dependent parsing, i.e., completely different paring
> > depending on whether a symbol is a type name, a variable name or a
> > class name. C++ is the prime horror story here, with so many
> > context-dependent rules that no parser for C++ is strictky conforming
> > to the standard.
>
> I have encountered this one while writing simple parsers. It's a lot
> harder to write a parser if each symbol's meaning cannot be resolved
> without look-ahead or back-tracking. It's a bit of a trade-off between
> having a smaller language (in terms of number of symbols) or a simpler
> parser. Languages with fewer symbols will tend to overload many of
> them (eg the "static" keyword in C++) and its the symbol overloading
> that results in context-sensitive meanings, which in turn make parsing
> difficult. Having more *unique* symbols reduces the need to overload
> the meanings of symbols, so parsers are simplified because they can
> resolve the symbol's meaning without looking at the context of its
> occurence. One of the hard parts of language design, is coming up with
> good keywords the meaning's of which can easily be guessed at.


It is not just about overloading keywords or symbols. This is usually
not that difficult to handle in the grammar, nor too difficult to
understand for a human. The problem is if a piece of text should be
parsed completely differently if an indentifier is a type name, a
variable name, a class name, or other. C++ has this problem in
abundance and needs so complex disambiguation rules that no C++
compiler (to my knowledge) implements the C++ standard precisely.


> > Many languages implicitly assume that all pointers can be null
> > pointers. It is better to distinguish this through the type, so you
> > can avoid testing for null pointers when the type ensures it can't
> > happen. Ideally, you would have subtyping, so a non-null pointer can
> > be passed to a function that expects a possibly-null pointer.
>
> I am having difficulty imagining a type which is implemented using
> pointers and ensures that null cant occur. This would imply that for a
> variable of such a type, memory would be allocated and then never
> released. Memory deallocation is what leads to null pointers and if
> the type ensures that you cant have null pointers then it is
> implicitly saying you cant deallocate the memory for variables of that
> type.


You need to initialize every pointer variable on creation, either by
allocating an object to them or by copying the value of another
pointer variable. And you need to make sure that the variable that
holds the pointer is no longer accessible after you deallocate the
object it points to. This is the case if you use garbage collection
for deallocation, but other mechanisms can also be used.


Many languages (e.g., SML) have pointers that can never be null. But
it is also possible to have both nullable and non-nullable pointer
types in a language.


> Also, on the subject of pointers, I have always been bothered by the
> disparity between value types (ie those with value semantics like
> int,float etc) and reference types (those with reference semantics
> like object variables in OOP).
> In Visual Basic for instance, if I dont initialise an integer
> variable, its initialised to zero for me and I can use it without
> getting some exception thrown (unless ofcourse I divide by zero).
> However, with an object variable, if I use it before initialisation, I
> get an null reference exception thrown at me. My question is, why dont
> I just get an object in its default state (ie the one that would
> result after the default constructor has run), ready for use just like
> I do with the integer variable?


That is, essentially, what happens in a language like SML: A pointer
variable is forced to be initialized at creation.


> Using lazy initialisation, the language could leave the reference as
> null, up until the point I try to use it.


Lazy initialization is more complicated and requires run-time testing.


> This way I dont have to litter my code with null pointer
> testing or "try...catch...end try" blocks. Plus, if I want to
> interpret the "nullness" of an object variable to mean something, I
> can still test for null and act accordingly. This scheme would
> eliminate a lot of error checking code hence allowing code to be
> clearer, be written faster, be compiled faster and be smaller.


My point of having explicit distinction (by type) between pointer than
can be null and pointers that can't is essentially to avoid having to
check for null at runtime and avoiding null-cases in unit testing or
verification.


Torben Mogensen


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.