Re: User definable operators

Craig Burley <burley@gnu.ai.mit.edu>
26 Dec 1996 14:08:12 -0500

          From comp.compilers

Related articles
[8 earlier articles]
Re: User definable operators jdean@puma.pa.dec.com (1996-12-18)
Re: User definable operators neitzel@gaertner.de (1996-12-18)
Re: User definable operators tim@franck.Princeton.EDU (1996-12-20)
Re: User definable operators nkramer@cs.cmu.edu (Nick Kramer) (1996-12-20)
Re: User definable operators hrubin@stat.purdue.edu (1996-12-24)
Re: User definable operators preston@tera.com (1996-12-26)
Re: User definable operators burley@gnu.ai.mit.edu (Craig Burley) (1996-12-26)
Re: User definable operators mfinney@inmind.com (1996-12-26)
Re: User definable operators leichter@smarts.com (Jerry Leichter) (1996-12-27)
Re: User definable operators genew@mindlink.bc.ca (1996-12-28)
Re: User definable operators WStreett@shell.monmouth.com (1996-12-29)
Re: User definable operators adrian@dcs.rhbnc.ac.uk (1997-01-02)
Re: User definable operators hrubin@stat.purdue.edu (1997-01-02)
[5 later articles]
| List of all articles for this month |

From: Craig Burley <burley@gnu.ai.mit.edu>
Newsgroups: comp.compilers
Date: 26 Dec 1996 14:08:12 -0500
Organization: Free Software Foundation, 545 Tech Square, Cambridge, MA 02139
References: 96-12-088 96-12-110 96-12-147 96-12-163
Keywords: design

hrubin@stat.purdue.edu (Herman Rubin) writes:


      Tim Hollebeek <tim@wfn-shop.Princeton.EDU> wrote:
      >Mathematical symbolism has as many nasty historical features as most
      >old software programs, unfortunately.


      But mathematics has never refuse to allow new operators, or new
      notation. It sometimes has been slow to adopt them.


      One would think that language designers would have learned from this,
      and gone in the direction of greater versatility, not less. Of
      course, it is necessary to have a lot more characters than the 95,
      including space, allowed by ASCII.


      Notation has to be overloaded to be of reasonable length. By the time
      one reads the lengthy variable names which seem to delight the
      computer people, the structure of the expression is lost. It is
      necessary to let the user invent notation, if necessary, and for the
      language and compiler to help, not hinder.


No.


It is necessary for the computer language, and the compiler, to
provide a consistent, agreed-upon means to express oneself. A good
language, and, by extension, a well-designed compiler, does _not_
allow the ad-hoc invention of notation. (If you mean the introduction
of new symbols from a set that is, to then, unused and reserved in the
language, that's okay, but that's not what I think you mean by
"overloaded notation".)


When humans perceive a need for a new notation, they _should_ invent a
new language, and the compiler that helps implement it. It's that
step that tells _everyone_ that the new notation exists, that serves
as a basis for agreeing on the meaning, and appropriateness, of that
new notation, and so on, while letting those who do not need or are
not prepared to use the new notation to continue using the old
notation and have the expressions made in that old language continue
to mean the _same_ thing as they used to.


So, I claim that computer languages should, ideally _today_ (as versus
some notion of an ideal language), be seen as a series of skins, or
cocoons, which serve us for a time until we outgrow it, at which point
we graduate to a new one. And, just as many people are still happy with
FORTRAN 77 even as Fortran 95 is being nailed down as an upgrade to
Fortran 90, not everyone has to outgrow a cocoon at the same time --
it's clear who are the caterpillars and who are the butterflies at
any given time (that is, it's clear which programs are written in
which language).


I'm not saying a good language is inherently inflexible in useful
ways, rather that it should be inflexible in ways we _should_ want
such things to be inflexible.


If the way an expression in a language is parsed can be changed by
making subtle, perhaps hidden, use of its flexibility, then it is too
flexible. Similarly, it is too flexible if the basic _meaning_ of
such expressions can be changed. By "parsed" and "meaning" I
emphatically _do not_ mean only the way _implementations_ (compilers
and interpreters) parse and implement them -- I mean the way _humans_
parse and interpret them, without requiring vast quantities of
semantic content beyond the definition of the language itself. The
human reasoning about the meaning of an expression goes _beyond_ that
of mere implementation (which is the primary focus of compilers and
interpreters), so the linguistically useful components of
interpretation also go beyond mere implementation.


I think and many hacker-programmers see languages and compilers as
tools to get work done, which is fine, but then they somehow believe
that, because those tools are successful, the languages they define
must inherently be good _languages_, aside from weaknesses (such as
"lack of flexibility") that they believe are easily addressed (e.g. by
going from C to C++, or similar).


I find, however, that when I view languages as tools for _expressing_
how to get work done -- that is, viewing the computer language as
essentially a tool for communicating with _people_ (including myself
in a few years) -- then I implement compilers, including extensions it
provides to the language, with what I believe is a more astute eye
towards aiding this aspect of _communication_.


As a result, the compilers I get to influence, feature-wise, and the
languages they implement, or that I design, tend to grow only in ways
that don't render earlier expressions (programs) written in them
_less_ understandable. So, these languages remain useful in a way
that others often don't -- because those others acquire extensions
that are often syntactically clever, usually easy to implement, have a
seductive way of "making sense", but in fact make _existing_ programs
(and certainly new ones) _harder_ for humans to understand.


For example, while I'm not a C++ programmer, I get the impression it
is possible to redefine operators such that an expression like


    a = b + c;


is turned into meaning something like


    a = foo (&b, &c);


such that either b or c may be modified.


Now, in the sense of being a clever tool for hacking up solutions,
this makes C++ a more flexible language than C, and yet doesn't
validate the widely held notion that C++ is a "superset" of C.


However, in the sense of being a useful _language_ for communicating
information to humans, C++ is, if the above (or similar things) is
true, a highly limited subset of C, because, suddenly, the expression


    a = b + c;


means _vastly_ less, in many important ways to _readers_ and _writers_
of code, than it used to in C. That is, the expression has less
content of useful meaning -- you can say _less_ about it that is
useful in a linguistic sense, in the same sense that you can say less
about what "vcxzqhk^$cx!#$CZX" means in an arbitrary extension-rich
language than what "Has anyone filled the pool?" means in English.


To wit: in C, "a = b + c;" _always_ meant "b and c are not modified".
If it's true that, in C++, that's no longer the case, then,
linguistically speaking, C++ is _less_ expressive than C as a
_language_.


(Similarly, in line with my earlier post, adding the "volatile"
attribute to C made the resulting language less expressive than the
former.)


I haven't kept up with the field of mathematics for decades now, but
it has been my impression that this field does _not_ introduce new
notation that makes existing, understandable expressions _less_
understandable.




If you're going to claim you're designing, or extending, a language,
then you had better spend some time learning about _language design_
-- and don't mistake any of the efforts behind creating C, Fortran,
Cobol, and so on as particularly good examples of this art.


If, instead, you're just designing a tool that can be arbitrarily
extended so that symbols mean new things, you are doing nothing much
more advanced than writing a portable assembler -- one that lots of
people can use, perhaps, but which is not particularly useful as a
language, and which is likely to have the same sort of flaws as the
HP-PA RISC assembler syntax (with its "conditional branch and annull,
but I'm not saying which direction even though it's crucial to the
semantics!" misfeature).


When you are designing a portable assembler -- like C or C++ -- don't
be surprised if you end up with a "language" that doesn't include even
simple concepts like "and" and "or". I've yet to see any assembler
that includes them, and neither C nor C++ does, while FORTRAN has for
several decades. And yet, it's hard to imagine how anyone could claim
a useful computer _language_ would not include simple ways to express
"and" and "or".


If you disagree with the above paragraph, you don't understand C,
Fortran, and language design well enough. I won't explain it to
anyone using new material -- I've written extensively on this topic in
the past, look it up on DejaNews if you like, but perhaps a hint will
help -- how in C do you _express_ the concept "A and B" in C, such
that A is a question such as "will the moon be full in Mexico City
tonight?" and B is a question such as "is it snowing in Juneau, Alaska
right now?". C does not provide any easy, direct way to _express_ the
"and" in "A and B" (though it provides a variety of ways to implement
it), while Fortran's ".AND." does.
--
"Practice random senselessness and act kind of beautiful."
James Craig Burley, Software Craftsperson burley@gnu.ai.mit.edu
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.