Re: Definable operators

Craig Burley <burley@tweedledumb.cygnus.com>
22 Apr 1997 21:31:36 -0400

          From comp.compilers

Related articles
[20 earlier articles]
Re: Definable operators fanf@lspace.org (Tony Finch) (1997-04-18)
Re: Definable operators monnier+/news/comp/compilers@tequila.cs.yale.edu (Stefan Monnier) (1997-04-18)
Re: Definable operators burley@tweedledumb.cygnus.com (Craig Burley) (1997-04-18)
Re: Definable operators apardon@rc4.vub.ac.be (1997-04-20)
Re: Definable operators genew@vip.net (1997-04-20)
Re: Definable operators kumo@intercenter.net (David Rush) (1997-04-20)
Re: Definable operators burley@tweedledumb.cygnus.com (Craig Burley) (1997-04-22)
Re: Definable operators burley@tweedledumb.cygnus.com (Craig Burley) (1997-04-30)
Re: Definable operators hrubin@stat.purdue.edu (1997-04-30)
Re: Definable operators apardon@rc4.vub.ac.be (1997-05-04)
Re: Definable operators Dave@occl-cam.demon.co.uk (Dave Lloyd) (1997-05-04)
Re: Definable operators ephram@ear.Psych.Berkeley.EDU (Ephram Cohen) (1997-05-06)
Re: Definable operators rideau@ens.fr (Francois-Rene Rideau) (1997-05-08)
[12 later articles]
| List of all articles for this month |
From: Craig Burley <burley@tweedledumb.cygnus.com>
Newsgroups: comp.compilers
Date: 22 Apr 1997 21:31:36 -0400
Organization: Cygnus Support
References: 97-03-037 97-03-076 97-03-112 97-03-115 97-03-141 97-03-162 97-03-184 97-04-027 97-04-095 97-04-113 97-04-130
Keywords: syntax, design

Craig Burley (burley@tweedledumb.cygnus.com) wrote:
> : Programmers have been "taught" that doing more things with less input
> : is so highly valuable that they forget that the purpose of _language_
> : design is to make works written in the language more readily
> : understood by their _human_ audience. And doing that properly usually
> : requires _restricting_ the expressiveness of the language --
> : e.g. explicitly saying that + cannot be made to mean anything other
> : than addition, even if the tools that process the language cannot be
> : made to enforce such a restriction.


apardon@rc4.vub.ac.be (Antoon Pardon) writes:
> But faulting the language because it allows "+" to be used
> inappropiately while we don't seem to have problems with languages
> that allow "insert" to be used inappropiately doesn't make sense to me.


I'm not faulting any language in a black-and-white sense, but I am
saying that the linguistic effectiveness, or if you will the
"ergonomics", of a language, can be (not yet scientifically?)
measured by looking at things like this. What I am faulting is people
who claim that a language is _necessarily_ better by making it more
flexible in terms of what fundamental lexemes (and so on) _mean_ in
that language. I am saying they should limit themselves to saying
"this tool would be better for the way I use it if it allowed
definable operators in its input language", because saying the
_language_ becomes better by doing so implies, to me, that it becomes
better at communicating ideas between humans -- and, in fact, the
opposite is often true when such changes are made to the language.


A language that allows no overloading of + at all, defines it to mean
only addition, and that its operands are only evaluated once, has
certain linguistic advantages, but less flexibility. (FORTRAN 77 and
C are basically like this.)


A language that encourages arbitrary overloading of + to whatever
people want to do with it, _including_ modifying the operators, not
performing a commutative operation, whatever, has much greater
flexibility, but compares poorly in _this_ respect to the former
example. (C++ is basically like this.)


These are the examples of extremes with regards to +. (Worse
extremism could be imagined, e.g. allowing
overloaded/replacement/addition of lexemes, so that "a = b + c;" can
be dynamically defined to mean "set a equal to the variable named `b +
c'".) A language that _allows_ overloading of +, but defines in the
language standard that + still means addition, a commutative operator
(assuming that's mathematically sensible -- I'm not an advanced-math
person ;-), and that the operands are only read -- even if existing
_implementations_ of the language can enforce few or none of these
restrictions -- can have all the _important_ flexibility of, say, C++,
with most or all of the linguistic advantages of F77 or C, in this
respect.


As far as "insert" being used inappropriately, of course, it would be
nice if we knew _everything_ people wanted to express, came up with
canonical lexemes, names, and other visual, aural, etc. ways to
express them that would be used the same way universally, and provided
universal implementations of them.


But, even when it comes to naming procedures, languages can be better
or worse _semantically_. If you see `R = SIN(S)' in FORTRAN 77, you
can be sure it's the math `sin' function being implemented if you
can't see any explicit overriding statement _above_ the statement
(e.g. `EXTERNAL SIN'). However, Fortran 90 is a little worse -- you
also have to look _below_ the statement (e.g. `CONTAINS' followed by
`FUNCTION SIN') to accumulate the same semantic information. Neither
is as good (in this respect) as a language that simply defined SIN as
a function that computed the math `sin' of its argument, regardless of
whether that language defined or provided an implementation of SIN.
That way, a person reading `R = SIN(S)' would _know_ it meant math
`sin', since that's how the language defined that particular name, and
wouldn't have to worry, or scan up/down the entire module, to figure
out if it really meant that.


If you designed a language that reserved a set of names to having
certain behaviors, even if you didn't mandate particular
implementations, and you came up with that set of names by looking at
all the pertinent words in the host language (English being the "host
language" of FORTRAN, C, and so on), you'd have a somewhat better
language. In such a language, you'd specify that, for example, a
procedure named "insert" doesn't delete any items in any lists passed
to it; that a function named "length" doesn't modify any of its
arguments or even necessarily evaluate them; and so on. Seems like a
lot of work now, but someday it might be helpful to have a language
like this. (Note that Fortran 90 does some of this in a sense, but it
only _fully_ defines what its reserved names do -- I'm talking about a
middle ground that allows flexibility of implementation and meaning
but with some "common-sense" ground rules in each case of a name you
think should be so restricted in that language, such as "insert",
"abort", and so on.)


Further, you can have a language that at least assures you that
arguments to a procedure are evaluated only once before invocation,
and only read by, that invocation of the procedure (C), or a language
that gives you fewer assurances, by allowing that procedure to _write_
the arguments without any change to the syntax of the invocation (C++
and any Fortran).


I'm not demonizing any languages here (despite the apparent claims of
people who I think are having trouble reading in the language I'm
writing in ;-), but I am pointing out there are at least two valid
uses of the term "language" that we too often think of and use
interchangably:


    1. A systematic means of communicating ideas


    2. A formal system of signs and symbols


FORTRAN, C, C++, make, sh, and so on all define examples of (2). What
I am saying is that they are easily shown to be better or worse
examples of (1) in a specific area being considered (such as use of +
as an infix operator, or use of procedure names), and we must not be
afraid of saying so. And I _am_ saying that C++ is one of the worst,
widely-accepted, examples of (2) in the sense that it's a bad example
of (1). (Of course, any one project can so restrict it's _own_ use of
a computer language to a subset that the participants decide is a much
better example of (1). I'd say that, to the extant they succeed, that
speaks well of the project participants, not of the underlying
computer language. After all, I've seen claims on USENET by people
who say they've done good, solid, OOP-like work in _assembler_ or
FORTRAN 77 by restricting themselves to "acceptable" expressions in
the underlying language. I don't think that makes assembler or F77
good OOP languages, though it presumably illustrates that they can be
effectively used as tools.)


I know the industry would be better off if we'd stop thinking of
designing an instance of (2) above as "language design", even though
it might be strictly true. If it's intended as an example of (1),
which is what _programming_ languages _must_ be to be useful as such,
then we must discuss _communications_ concepts, which gets into the
semantics of expressions (not arithmetic expressions, but linguistic
ones).


If you start thinking in terms of _maximizing_ the amount of semantic
information in a small, localized expression typical of what you
expect programmers to be doing in your language, you start doing a
better job of (1). Maximizing semantic information means, to me,
maximizing the number of relevant, even if smallish, things a person
reading a work in the language can _know_ about the localized
expression without having to worry about context. E.g. "a = b + c;"
has greater useful semantic information in C than in C++, for reasons
I've stated above.


Whereas, the push for definable operators, overloaded operators,
pass-by-reference, and so on is often driven by an overall process of
_compressing out_, or _minimizing_, the amount of semantic information
in a localized expression. It's _precisely_ the point that you
_can't_ say nearly as much what "a = b + c;" means in C++ that makes
C++ a better tool for so many programmers -- because they can redefine
+ to mean something very different from addition, to mean "change b
and c", and so on. That's exactly what so many programmers ask for --
maximum flexibility of expression -- which militates against the
maximizing of semantic information that is so important in
language-(1) design.


Yes, I am seriously thinking of, even beginning some rudimentary work
on, writing one or two documents to clarify what I'm saying (so often
in these newsgroups) in a more readable way. Note that I'm still
trying to figure it all out -- right now these concepts seem like a
pretty big elephant to me, and I feel like I'm the only (blindfolded)
person trying to describe it. About all I can say with confidence is
what constitutes bad examples of language-(1) design, whereas I feel
like I'm still only guessing about what constitutes good examples of
it.


It has occurred to me recently that perhaps the reason we've never
come close to making Artificial Intelligence (AI) really happen is
that we didn't first tackle the less-exciting, but perhaps
fundamentally important to AI, field of Artifical Languages (AL) --
examples of which include every programming language we've ever
created (Fortran, C, etc.). It might turn out to be the case that,
until we can reason as precisely about the semantic content of an
artificial language as defined by (1) above as we do, today, about its
content as defined by (2), we can't tackle or do real AI, because real
AI would mean the machine can truly intercommunicate with the human at
the level of ideas. But I don't know enough about AI or AL to say
this is more than perhaps just an intuition, or good/bad guess.


I'm sure I'd be able to communicate these concepts better if I had an
academic background in computer science. (Okay, if I had _any_
academic experience in it, which I basically don't, unless you count
one semester on formal languages. ;-) But, I also feel, based on my
observations of what our educational institutions appear to be
teaching, that perhaps such a background would result in my refusing
to take seriously my current concerns about language-(1) design,
having invested so much effort in being able to precisely describe
language-(2) design. E.g., I see little evidence that there's much
language-(1) design going on in the field of Artifical Languages -- we
seem to be intent on repeating linguistic mistakes of the past -- but
maybe that's because "real programmers" just aren't listening to, or
implementing the proposals of, those doing good language-(1) design
work, in which case I'm at fault as well for not discovering those
people or their work so far.
[I think there actually has been a fair amount of work on these
issues, but it's languishing in the journals that discuss human
factors and the like. But it's a real trick to design a language that
will keep bad programmers from writing bad programs but won't keep
good programmers from writing good programs. On the other hand,
considering how many bad programmers there are, maybe we should forget
about the good programmers for a while. -John]




--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.