Re: Use of punctuation in a language?

Robert A Duff <bobduff@shell01.TheWorld.com>
8 Nov 2003 01:40:08 -0500

          From comp.compilers

Related articles
Use of punctuation in a language? hsauro@cs.caltech.edu (Herbert) (2003-10-31)
Re: Use of punctuation in a language? derkgwen@HotPOP.com (Derk Gwen) (2003-11-01)
Re: Use of punctuation in a language? rosing@peakfive.com (MattR) (2003-11-01)
Re: Use of punctuation in a language? gah@ugcs.caltech.edu (Glen Herrmannsfeldt) (2003-11-02)
Re: Use of punctuation in a language? joachim.durchholz@web.de (Joachim Durchholz) (2003-11-08)
Re: Use of punctuation in a language? bobduff@shell01.TheWorld.com (Robert A Duff) (2003-11-08)
Re: Use of punctuation in a language? bear@sonic.net (Ray Dillinger) (2003-11-11)
Re: Use of punctuation in a language? jcownie@etnus.com (James Cownie) (2003-11-11)
Re: Use of punctuation in a language? landauer@got.net (Doug Landauer) (2003-11-11)
Re: Use of punctuation in a language? Martin.Ward@durham.ac.uk (Martin Ward) (2003-11-11)
Re: Use of punctuation in a language? jvorbrueggen@mediasec.de (Jan C. =?iso-8859-1?Q?Vorbr=FCggen?=) (2003-11-21)
Re: Use of punctuation in a language? vbdis@aol.com (2003-11-21)
[2 later articles]
| List of all articles for this month |

From: Robert A Duff <bobduff@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: 8 Nov 2003 01:40:08 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 03-10-129 03-11-016
Keywords: syntax, design
Posted-Date: 08 Nov 2003 01:40:08 EST

"Glen Herrmannsfeldt" <gah@ugcs.caltech.edu> writes:


> "Herbert" <hsauro@cs.caltech.edu> wrote in message
> > Does anyone have any comments on the use of punctucation is a
> > language, eg, compare the following two approaches?
>
> > a = 3.4; b = 6.7;
>
> > or
>
> > a = 3.4 b = 6.7
>
> > which is better, ease of reading for humans, issues regarding design
> > of compilers (eg the punctuation-less version requires
> > look-ahead?). Perhaps lack of punctuation is a bad language design?


Short answer: I say, when it doubt, go with reasonable punctuation.
The more visual cues you can give to the human reader, the better.


> It depends on what you mean by punctuation. Statement ending markers,
> either explicit such as semicolon, or implicit end of line, are
> certainly convenient both for the compiler and human reader.


I don't like languages with significant end-of-lines, because you
inevitably end up needing to write something longer than a physical line
(on a piece of paper, or in a window), and then you need some kludgery
to say "this particular end-of-line is not a terminator".


>... If you compare languages with and without reserved words, you
>notice some of the important parts of constructing a language.


> Note that using the C preprocessor you could write programs using
> only alphanumeric characters and white space, though with the
> addition of some reserved words.


Yes, you could write them. Could you read them? Could a typical C
programmer read them? ;-)


> Both Fortran and PL/I have no reserved words, and depend on
> punctuation and other language features keep some constructions from
> being ambiguous. It is not hard to write programs that can be very
> confusing to a human in either language.


It's not hard to write confusing programs in *any* language. Some make
it easier than others, but it seems to me, the criteria should be, "is
it easy to write readable programs?" and "is it hard to write confusing
programs BY ACCIDENT"?


>... To make it even worse, blank space is not significant in
>Fortran, except inside character string constants.


> There is a contest where the goal is to make the most obfuscated C
> program, as C has a number of features that are pretty good at doing
> just that.


I don't think the mere existence of obfuscated C is a valid complaint
against C. Valid complaints are: it's easy to obfuscate by accident,
and/or one has to go to some extra trouble to avoid obfuscation.


> > Any advice or comments would be gratefully received, I haven't seen
> > anything in the books one this, so was wondering what others
> > thought? We're designing a simple language for the exchange of
> > models in molecular biology, we have an XML based one, but know we'd
> > like a human readable one.
>
> The problems in a data description language are somewhat different
> than in a programming language,


Perhaps, though I suspect that for complicated data structures with
nesting, the issues are quite similar. I find XML to be rather
unreadable, despite its very nice and clean structure.


  though maybe your models are related
> to programs. With human readable as a goal, I would consider which
> punctuation makes it more, or less, human readable. I would consider
> the ease of writing a parser for it a secondary goal, though usually
> the are related.


Agreed.


> > [ Having used languages in which any string of characters is a valid
> > program, I can report that I vastly prefer languages with punctuation
> > because they make it harder to write a program that is syntactically
> > valid but doesn't mean what I wanted it to. It's a little easier for
> > compilers to parse languages with statement separators and explicit
> > brackets, but I don't find that anywhere near as compelling as the
> > human factors involved. -John]


Agreed.


> Having a statement end punctuation at least allows the compiler to
> limit the effects of errors somewhat.
>
> It is nice to have programs that are one character away from a correct
> program not be syntactically valid. There is a famous example from
> Fortran 66, the assignment statement:
>
> DO 1 I=1.2
>
> Replacing a comma in a DO statement with a period turns a valid DO
> statement into a valid assignment statement.


Yes, but I don't think you can take that as a hard rule. For example,
there are many languages where "X := X + 1;" means something very
different from "X := X - 1;", and there's only one character
difference. Perhaps we are used to noting the difference between
"+" and "-" since grade-school maths, whereas perhaps a difference
between "." and "," is less noticeable.


> A statement that I often use in AWK has slightly interesting punctuation:
>
> while( getline < file > 0) {
>
> The < operator is normally the less than operator, but as a getline option,
> it specifies the file to read. The return value of getline is then tested
> as the loop condition. Confusing or not depends on the person reading it.


I haven't written any awk in many years, but should one write:


        while((getline < file) > 0) {


instead? Or, better yet, should the awk designers have used
"getline(file)" or "getline <-- file" or some such, instead?


Or is it our tie to the 7-bit ascii character set that causes trouble?


- Bob


P.S. Thanks to our esteemed moderator for allowing us to chat about
language design issues. They're at least *related* to compilers (as are
linkers, debuggers, etc), and anyway, where else can language design be
discussed intelligently?


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.