Re: User definable operators

Craig Burley <burley@gnu.ai.mit.edu>
18 Dec 1996 00:06:53 -0500

From comp.compilers

Related articles
User definable operators wclodius@lanl.gov (William Clodius) (1996-12-14)
Re: User definable operators fjh@murlibobo.cs.mu.OZ.AU (1996-12-15)
Re: User definable operators cef@geodesic.com (Charles Fiterman) (1996-12-15)
Re: User definable operators mslamm@pluto.mscc.huji.ac.il (Ehud Lamm) (1996-12-15)
Re: User definable operators ddean@CS.Princeton.EDU (1996-12-15)
Re: User definable operators dennis@netcom.com (1996-12-15)
Re: User definable operators fjh@mundook.cs.mu.OZ.AU (1996-12-15)
*Re: User definable operators burley@gnu.ai.mit.edu (Craig Burley)* (1996-12-18)**
Re: User definable operators jdean@puma.pa.dec.com (1996-12-18)
Re: User definable operators neitzel@gaertner.de (1996-12-18)
Re: User definable operators tim@franck.Princeton.EDU (1996-12-20)
Re: User definable operators nkramer@cs.cmu.edu (Nick Kramer) (1996-12-20)
Re: User definable operators hrubin@stat.purdue.edu (1996-12-24)
Re: User definable operators preston@tera.com (1996-12-26)
[12 later articles]

| List of all articles for this month |

From:	Craig Burley <burley@gnu.ai.mit.edu>
Newsgroups:	comp.compilers
Date:	18 Dec 1996 00:06:53 -0500
Organization:	Free Software Foundation, 545 Tech Square, Cambridge, MA 02139
References:	96-12-088 96-12-117
Keywords:	design

dennis@netcom.com (Dennis Yelle) writes:

      William Clodius <wclodius@lanl.gov> writes:
      >Many programming languages allow the user to overload of language
      >defined operators. But a few languages also allow the user to define
      >their own operators. I would like to have some feedback on the
      >experience of others with user definable operators with respect to
      >specifying their syntax, associativity, precedence, semantics (e.g.,
      >side effects or not), etc.
      [...]

      >[My experience with such languages has been miserable. It means that no
      >two programs are actually written in the same language, so they're all
      >unreadable. Extensible languages enjoyed a short vogue in the 1970s, and
      >I wasn't sad to see them go. See Cheatham's EL/1 and Irons' IMP72. -John]

      John:
            Does this comment of yours apply to C++ too?

      Everyone:
            Is the tide turning against C++ ?
            That is, has C++ gone too far in this direction?

      --
      dennis@netcom.com (Dennis Yelle)
      [At least C++ doesn't let you invent your own syntax. As some other people
      have noted, you can do it right or you can do it wrong, e.g., overloading
      the arithmetic operators to handle bignums makes sense, using them for
      string packages becomes baffling. -John]

C++ went too far as soon as it became a general superset of C.

E.g. anyone care to say what the following statement actually does --
even how to parse it?

    foo = (a) - (b);

A language that requires the reader to constantly scan upward (through
#include'd files as well) to determine even the most mundane things
like "is this a binary operator here" already has serious problems,
linguistically speaking.

In practice, the problems C and C++ present (with its "foo << bar"
notation, an expression that also cannot be well-guessed-at by the
reader in terms of what it is trying to do -- shift foo left bar bits,
or print bar on stream foo?) tend to be reduced by use of canonical
naming schemes, spacing, and, frankly, making the programmer memorize,
or acculturize, lots more than would otherwise be needed.

However, the problems do creep into implementations of tools such as
indenters, "surface-level" optimizers, and such -- the kinds of things
I do often, and so I do encounter real problems that results from C's
poor design (as a language).

C is an example of a great tool, but a poor language -- it is
difficult to express and discern some very common programming concepts
using C, yet it is nevertheless a great way to get "work" done.

Though I still do most of my programming in C, I've come to respect
Fortran more than I ever thought possible. However, I'm not happy
with one new Fortran feature -- CONTAINS, which requires a programmer
to look both _up_ and _down_ to figure out even remotely what "SIN(A)"
might mean, unlike F77, which required looking only _up_. This was
introduced in Fortran 90, and added a horrible amount of bad language
design to an already hurting language, as explained below.

Overloadable operators, nifty syntactic shortcuts like "(x)" for casts
(the problem that afflicts "foo = (a) - (b);" above, in case you
didn't get it -- you have to know whether "a" is a type to know
whether "-" is unary or binary), and so on all fall under a general
principle for _language_ design that I summarize with this borrowing
from Jurassic Park:

    "The question is not, _can_ we do it, but rather _should_ we do it?"

Good language design results in a language that makes it rather
difficult to write code that other people will misread. That's pretty
much all there is to it. Whether you can write a lexer/scanner that
will actually parse the code written in a language should _not_ be
among the top priorities of language design, though of course it might
have veto power.

Further: one of the most annoying responses I get to my complaints
about the shortcomings of C, Fortran, and so on -- e.g. my complaint
about CONTAINS in F90 -- is this:

    "But with smart editors people can easily find out what they need just
      by asking, since those editors will have the _semantic_ information needed
      to fill out the syntactic missing pieces. Different semantic meanings
      can be notated with different colors, for example."

This annoys me because, first of all, it's untrue -- even the smartest
C editor, for example, can't figure out what "foo = (a) - (b);" means
_unless_ it knows for certain that "a" is a type -- because there's no
way _in the language_ to specify that "a" is _intended_ to be a type
someday during project development, or that it will _never_ be a type.

That is, programmers can't ask reasonable questions about their code,
questions that can be answered without complete compilation in
general, because the _language_ prevents it. The programmer can't say
"I know `a' will not be a type someday" in an elegant way that makes
sense to readers of the code, _without_ using meta-language facilities
that are likely to be unique to the editing environment.

The question "is the `-' in `(a) - (b)' unary" should _not_ have to be
answered by a smart editor, and in any case cannot always be answered!

So, during large-application development, smart editors just outsmart
themselves and their users, because they claim to "know" things they
can't possibly know. They'll tell users of badly-designed languages
that functions are arrays, or arrays are types, or binary operators
are unary operators, and at that level, none of this _should_ be
indeterminate, for most kinds of programming. Or, they'll throw their
hands up and tell their users nothing useful in lots of situations
where the knowledge _is_ known, but not yet expressible in the
poorly-designed language they're trying to help the user cope with.

Second of all, my counter-point to this "smart editors make badly
designed languages look really great" idea is "better yet, smart
editors make well-designed languages look vastly better, yet the
well-designed languages do not requires smart editors".

I'll go back to my CONTAINS nit to elucidate. In FORTRAN 77, one
could write a statement function (a syntactic botch as well, but in a
different way ;-), as in:

FOO(X) = X + 3.4
...
PRINT *, FOO(2.1)

(Think of "inline float FOO(float X) { ... } ... { ... foo(2.1) ...}"
in C, though FOO is really a nested procedure, as offered by GNU C.)

Like C, the programmer has to look _upward_ to determine some details
about what the "foo()" invocation actually means, though, in Fortran,
the problem is made worse by the fact that "foo()" doubles as what C
represents with "foo[]" -- an array reference -- but that's yet
another issue.

Anyway, Fortran 90 extended these statement functions to their more
general "nested procedure" concept via:

CONTAINS
REAL FUNCTION FOO(X)
...
END FUNCTION FOO

Nifty, but F90 _requires_ the CONTAINS block to _follow_ the main code
-- unlike C, Pascal, and other languages. (They didn't even require
-- or allow! -- a nesting "END CONTAINS"!!)

The reasoning? Seems they decided that well-designed languages
allowed _top-down_ coding, instead of bottom-up. That is, you put
your main() at the top, then its subprocedures after it, then their
subprocedures, and so on, in that order.

Okay, that _would_ be a well-designed language, but F90 blew it by
_requiring_ this top-down form of coding and further _disallowing_
full use of this top-down form throughout.

That is, a programmer still has to say

INTRINSIC FOO

or

EXTERNAL FOO

or

DOUBLE PRECISION FOO

_before_ "FOO(2.1)" to specify _some_ aspects of FOO, but if FOO is
intended to be a nested procedure that cannot be specified using the
(declared obsolescent, for good reason ;-) statement-function feature,
then the programmer _must_ specify _every_ aspect of FOO _after_
"FOO(2.1)".

So, a programmer reading a large block of code to find out what
"FOO(2.1)" really means _must_ scan both _upward_ and _downward_ just
to determine whether FOO is an array or a function!

After all, if FOO is an array, the declaration _will_ appear above,
but if it is a function, that information can appear above or below,
especially when it comes to details of how the function is called,
what type it returns, etc.

As a result, a programmer reading F77 code has less work to do to
understand what a particular "FOO(2.1)" reference means than if that
_same_ code is compiled via a F90 compiler, because in the latter case
the entire meaning of FOO can be changed _after_ the "FOO(2.1)"
reference via CONTAINS.

You can imagine how shaky this might make some programming tasks when
looking at code that uses SIN(), SQRT(), and so on. Yet, as with C,
_canonical_ use of the language hides some of this shakiness.

(In C, this would be equivalent to a huge program using "printf()",
"scanf()", and so on, and yet having the language requiring
_particular_ kinds of redefinition of those functions to _follow_, not
precede, those references to affect how they are interpreted!)

Now, fans of this misfeature -- so-called by me because grafting
top-down linguistics onto essentially a bottom-up language creates a
topsy-turvy language -- claim that "smart editors" make the problem go
away.

"Okay", I say, "then if you're depending on smart editors for
readability, why not depend on them to do the top-down presentation
for you instead, and make F90 _itself_ no worse-designed than F77 as a
language?"

That is, if you're going to depend on smart editors anyway, design
CONTAINS so that it has to come _before_ any executable code in a
program unit -- just like all the other stuff, and just like C,
Pascal, and so on -- and let the smart editor display the chunks of
specification and detail "backwards", or as wished by the user of the
editor.

Or even, if you're going to depend on smart editors, why not make F90
a _complete_, strict (no more so than CONTAINS already is anyway ;-)
top-down language?? Then, let those smart editors handle the
automatic conversion of (incompatible) F77 programs to the new regime
by moving all the specification statements, statement functions, and
so on to the _end_ of the program unit and translate them as well?

So, in my opinion, relying on smart editors to paper over poorly
designed language features, _especially_ new ones you're
contemplating, is another instance of throwing increasingly brittle
complexity on top of poor designs to hide the problems -- and that
approach inevitably leads to slow, unmaintainable systems, where you
can't even trust your editors to get things right.

Instead, design languages (and language features) so that, if they
have a text-model substrate like most modern languages still do (that
is, if they use the 2-dimensional lines-of-characters model like C,
Fortran, Pascal, and so on -- as compared to a visual language), the
code typically written in that language is _inherently_ readable, with
little or no chances of misinterpretation, by programmers looking
purely at printed listings of large programs -- without any automated
aids. Don't require programmers to _avoid_ use of language features
to preserve readability -- for access to "clever" features, consider
requiring _at-use_ notation, even though this increases the amount of
"typing" needed.

Design the language so notational conveniences are used only for
expressions that are already widely understood in those contexts.
E.g. "+" always means "add", and is always lower precedence than "*",
which always means multiply. (That's assuming your audience is
already well-acquainted with infix notation.) And limit the semantic
meaning of those expressions to just the meaning people will already
be accustomed to.

Then, for the other notational conveniences and semantic meanings we
all crave, add smart editors that "fill them out" in the language
using clear, if somewhat verbose, language, perhaps even showing those
fill-outs using the notational convenience you're used to. Include
the more verbose expressions in the language, of course, but design
them so they _cannot_ be mistaken for the more general, widely
understood, notationally convenient expressions.

That way, you've got a well-designed language (or at least not
worse-designed one than its previous iteration), and you'll find that
your smart editor is easier to design, implement, understand, and
interact with, because its job is much simpler.

C example: volatile variables. This is a poorly designed language
feature, because even though it is rarely _used_, its mere presence in
the language means that you cannot depend on being able, as a
programmer, to reduce, e.g.

    x + y - x

to

    y

(or similar) until you've determined for certain that "x" is not
volatile -- something that can be indeterminate at certain stages of
certain projects.

And, I can't imagine any math classroom where it is ever taught that,
_sometimes_, you can't reduce "x + y - x" to "y". So, volatile
variables should never have been added to C, or they should be
unusable in expressions such as these, so programmers never have to
worry about them "popping up" unexpectedly.

Instead, a well-designed language requires the programmer to specify
access to volatile variables in a manner that is clear to the reader
of the code, e.g.:

    temp = volatile_read (x);
    ... x + y - x ...

Or, instead of the "suspect"

    x = 0;
    y = 3;
    x = 1;

the programmer would have to write something like:

    volatile_write (x, 0);
    y = 3;
    volatile_write (x, 1);

To fans of notational convenience, my solution is worse because it
takes more characters to type. But, it is superior as a language,
because it makes clear what is going on to the reader in a way that is
highly important, leaving the programmer to safely assume general
expressions do _not_ involve volatile variables.

And, instead of using smart editors to determine whether "x + y - x"
might involve volatile variables, one uses them to automate writing
the (presumably quite rare) "volatile_write(...)" and
"volatile_read(...)" sequences, reducing keystrokes to the same low
level required by C, but reducing the overall complexity of the
language+editor+compiler+debugger system as well, and probably
resulting in fewer bugs overall.

I guess my point above can be summarized as: "if you claim your clever
language extension does not really hurt overall language design
because it is rarely used, then you should not mind if we redesign it
to not hurt the language design at all by requiring it to be
explicitly flagged each time -- because the extra typing will be
rarely done, if we can believe your claim about the extension being
rarely used". --

"Practice random senselessness and act kind of beautiful."
James Craig Burley, Software Craftsperson burley@gnu.ai.mit.edu
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: User definable operators

Craig Burley <burley@gnu.ai.mit.edu>18 Dec 1996 00:06:53 -0500

Craig Burley <burley@gnu.ai.mit.edu>
18 Dec 1996 00:06:53 -0500