Re: Programming language and IDE design

Martin Ward <martin@gkc.org.uk>
Sat, 16 Nov 2013 15:55:10 +0000

From comp.compilers

Related articles
[10 earlier articles]
Re: Programming language and IDE design gneuner2@comcast.net (George Neuner) (2013-10-24)
Re: Programming language and IDE design martin@gkc.org.uk (Martin Ward) (2013-11-07)
Re: Programming language and IDE design gah@ugcs.caltech.edu (glen herrmannsfeldt) (2013-11-08)
Re: Programming language and IDE design DrDiettrich1@aol.com (Hans-Peter Diettrich) (2013-11-08)
Re: Programming language and IDE design gneuner2@comcast.net (George Neuner) (2013-11-08)
Re: Programming language and IDE design jthorn@astro.indiana.edu (Jonathan Thornburg) (2013-11-10)
*Re: Programming language and IDE design martin@gkc.org.uk (Martin Ward)* (2013-11-16)**
Re: Programming language and IDE design DrDiettrich1@aol.com (Hans-Peter Diettrich) (2013-11-16)
Re: Programming language and IDE design gneuner2@comcast.net (George Neuner) (2013-11-18)
Re: Programming language and IDE design sgk@REMOVEtroutmask.apl.washington.edu (Steven G. Kargl) (2013-11-19)
Re: Programming language and IDE design gneuner2@comcast.net (George Neuner) (2013-11-19)
Re: Programming language and IDE design jonathan@cobalt.astro.indiana.edu (Jonathan Thornburg) (2013-11-19)
Re: Parsing Fortran, was Programming language and IDE design gah@ugcs.caltech.edu (glen herrmannsfeldt) (2013-11-19)
[9 later articles]

| List of all articles for this month |

From:	Martin Ward <martin@gkc.org.uk>
Newsgroups:	comp.compilers
Date:	Sat, 16 Nov 2013 15:55:10 +0000
Organization:	Compilers Central
Keywords:	design, comment
Posted-Date:	16 Nov 2013 11:35:21 EST

On Friday 08 Nov 2013 at 05:21, glen herrmannsfeldt <gah@ugcs.caltech.edu>
wrote:
> Much "implementation dependence" comes from dependence on the host
> processor. Maybe sometimes too much, but it is nice to be able to use
> the new features of newer processors.
>
> Many early machines were sign magnitude, but most now use twos
> complement, which mostly is an advantage.
>
> Different word sizes are also a common dependence, with 36 bit words
> and 6-bit characters common on the early scientific machines.

Avoiding dependence on the processor is a major reason for eliminating
implementation dependence.

On Friday 08 Nov 2013 at 05:45, "Hans-Peter Diettrich" <DrDiettrich1@aol.com>
wrote:
> > Similarly, a string data type
> > which can hold any size of string.
>
> While I strongly agree that a string type is an absolute must, string
> handling and text processing can be considered subject to a domain
> specific language.
>
>...
>
> > Hash tables should allow any type of key and value, and so on.
>
> I'd consider hash tables as domain specific, together with databases and
> other containers. As opposed to strings, such items should be part of
> libraries, be language specific or third-party.

It is hard to imagine a significant class of programs which would have
no need for strings. Even numeric analysis programs need to print out
results in a readable form! Similarly, once one has started using
hash tables extensively, they are essential for so many classes
of program that to describe them as "domain specific" is as absurd
as saying that arrays or lists are "domain specific".

My preference is to define the fundamental data type as a *sequence*:
with operators to extract the nth element of a sequence,
extract a subsequence, create a sequence and so on.
Arrays and lists are just two different implementations of sequences:
with different trade-offs between the efficiency of operations.
The declaration of a sequence can specify the implementation
(array or list) or leave it to the compiler to choose.

In this context, a string is simply a sequence of characters.

It seems to me absurd to have different notations for:
(1) Extracting the nth element of an array;
(2) Extracting the nth character in a string; and
(3) Extracting the nth item in a list.

To me, this is as bad as using a different notation for an "array
of integers" as for an "array of floats"!

On Friday 08 Nov 2013 at 21:04, George Neuner <gneuner2@comcast.net> wrote:
> The majority of successful languages are LL(k) for small k. It's hard
> to get much simpler than that for parsing.

C is not LL(K), neither is COBOL, nor C++, nor I think is Java.
That probably covers the majority of successful languages:
if "success" is defined as "total lines of code in production".

On Friday 08 Nov 2013 at 21:04, George Neuner <gneuner2@comcast.net> wrote:
> >(2) Absolutely no behaviour should be "implementation dependent"
> >or "undefined". Every syntactically valid program should have
> >a single semantically valid meaning (even if that meaning
> >is "halt with an error message").
>
> It's impossible not to have implementation dependent behavior: e.g.,
> program execution time is a behavior that can't be specified.

In my opinion, execution time is not part of the semantics or meaning
of a program.

> Witness that there is not a single CPU that fully implements
> IEEE-754 arithmetic (old or new spec), and every implementation
> differs in what is lacking and in what needs to be corrected by
> software to yield compliant results.

This is the huge advantage of IEEE-754 arithmetic: the programmer
can know for certain what the result of a floating point operation
will be. The language abstracts from all the messy differences
between implementations: while also providing an incentive
to hardware designers to ensure that the time-critical components
of the specification can be implemented efficiently.

Early CPUs did not have a hardware stack: your attitude would suggest
that *therefore* languages should define the behaviour of a recursive
subroutine to be "implementation dependent". Fortunately, there were
language designers, such as the designers of Lisp and Algol,
who recognised the value of recursion and included it in their language.
As a result, hardware designers added stack operations to their CPUs.

If we had had more languages with hash tables built in,
we might by now have CPUs with content addressable memory:
allowing for constant time hash table implementations.

On Sunday 10 Nov 2013 at 04:41, "Jonathan Thornburg"
<jthorn@astro.indiana.edu> wrote:
> Martin Ward <martin@gkc.org.uk> wrote:
> > (2) Absolutely no behaviour should be "implementation dependent"
> > or "undefined". Every syntactically valid program should have
> > a single semantically valid meaning (even if that meaning
> > is "halt with an error message").
>
> This is really problematic in the area of floating-point arithmatic.
> For example, at what point should floating-point arithmetic overflow
> or underflow, and what should be the consequences of this happening?
> To be very precise, what "single semantically valid meaning" should we
> perscribe for trying to compute the square of 1.0e300?

These questions are precisely why the IEEE 754 floating point standard
was needed: it specifies precise answers to all these questions
(without leaving anything "implementation dependent",
other than extra information encoded in a NaN). See the paper
"What Every Computer Scientist Should Know About Floating Point":

http://www.validlab.com/goldberg/paper.ps

Alas, although the result of a = b + c is now precisely defined
in C and C++ for floating point variables b and c, the result can be
"undefined behaviour" for ordinary 32 bit signed integer values(!)
Other examples are integer divide by zero and "shift past bitwidth".

Programmers who want defined behaviour from their programs
(and who doesn't?) have to add extra code to integer operations:
even when running on hardware which implements two's
complement arithmetic. For example, CERT recommends:

    if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
            ((si_b < 0) && (si_a < (INT_MIN - si_b)))) {
        /* Handle error */
    } else {
        sum = si_a + si_b;
    }

> Java encountered all of these problems, and eventually chose to back
> off its "precise specification" to allow implementation-defined
> semantics for floating-point arithmetic.

That would be a disaster for Java if it were true, however
the current Java language specification (Java SE 7 Edition) states:

"The Java programming language requires that floating-point arithmetic
behave as if every floating-point operator rounded its floating-point
result to the result precision. Inexact results must be rounded
to the representable value nearest to the infinitely precise result;
if the two nearest representable values are equally near,
the one with its least significant bit zero is chosen.
This is the IEEE 754 standard's default rounding mode known
as round to nearest."

The alternative floating point semantics (provide a result which is
a rough approximation to the rounded exact result) was implemented
in an early version of the Intel P5 Pentium, but was not well received:
with many users describing it as a "bug"! :-)
http://en.wikipedia.org/wiki/Pentium_FDIV_bug

--
Martin

Dr Martin Ward STRL Principal Lecturer and Reader in Software Engineering
martin@gkc.org.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/
Mirrors: http://www.gkc.org.uk and http://www.gkc.org.uk/gkc
[This discussion has gone into the weeds. It's over unless someone
wants to bring it back to compiler design and implementation. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Programming language and IDE design

Martin Ward <martin@gkc.org.uk>Sat, 16 Nov 2013 15:55:10 +0000

Martin Ward <martin@gkc.org.uk>
Sat, 16 Nov 2013 15:55:10 +0000