Re: Prefix, infix and function-call and their implications in embedded language readability

Kaz Kylheku <kkylheku@gmail.com>
Thu, 21 Jan 2010 19:44:36 +0000 (UTC)

          From comp.compilers

Related articles
Prefix, infix and function-call and their implications in embedded lan pengyu.ut@gmail.com (Peng Yu) (2010-01-20)
Re: Prefix, infix and function-call and their implications in embedded gah@ugcs.caltech.edu (glen herrmannsfeldt) (2010-01-21)
Re: Prefix, infix and function-call and their implications in embedded herron.philip@googlemail.com (Philip Herron) (2010-01-21)
Re: Prefix, infix and function-call and their implications in embedded kkylheku@gmail.com (Kaz Kylheku) (2010-01-21)
Re: Prefix, infix and function-call and their implications in embedded bartc@freeuk.com (bartc) (2010-01-21)
Re: Prefix, infix and function-call and their implications in embedded monnier@iro.umontreal.ca (Stefan Monnier) (2010-01-25)
| List of all articles for this month |
From: Kaz Kylheku <kkylheku@gmail.com>
Newsgroups: comp.compilers
Date: Thu, 21 Jan 2010 19:44:36 +0000 (UTC)
Organization: A noiseless patient Spider
References: 10-01-069
Keywords: design, comment
Posted-Date: 21 Jan 2010 14:57:59 EST

On 2010-01-21, Peng Yu <pengyu.ut@gmail.com> wrote:
> Consider the following three expressions, which are valid C, mit-
> scheme and Mathematica expressions. There are of course many other
> expressions that express the same thing in other languages, or in the
> same language but other different ways.
>
> 3+2*5>7
> (> (+ 3 (* 2 5)) 7)
> Greater[Plus[3,Times[2,5]],7]
>
> Apparently, at least to me, the first expression is the most readable.


Really? What if we replace 2 3 5 7 by a b c d, and then change
the meaning of the operators, or give them a precedence you aren't
accustomed to?


What if 3+2*5>7 is actually a Smalltalk expression, such that it just
means ((3+2)*5)>7?


> One possible reason is that we learn this algebraic notation much
> earlier than the other two, which is in analogy to that we can respond
> to the native language (say, English) much faster than to a second
> language (say, French).


Another possible reason is that the algebraic notation has only a few
operators, whose precedence you have memorized (and are assuming to
hold true of the expression above).


Would it still be readable if the grammar had 500 operators, arranged into 200
precedence levels?


Another reason is that because you have a few operators, you can use special
glyphs for them, which are distinct from numbers and variables.


That second Lisp notation is unambigous. So we can replace all of the
non-punctuation symbols, and still recognize the tree shape as being the same,
provided we keep the parentheses in the printed notation as they are:


    (> (+ 3 (* 2 5)) 7)


    -> substitute generated symbols for all non-nil atoms ->


    (G0001 (G0002 G0003 (G0004 G0005 G0006)) G0007)


I still know that the major constituent of the expression is G0001,
whose arguments are (G0002 ....) and G0007.


If we substitute the non-punctuation symbols of the infix expression, we are
lost; there is no explicit grouping there to retain:


    G0001 G0002 G0003 G0004 G0005 G0006 G0007


Can you remember that G0002 and G0004 are binary operators,
and that G0004 has a higher precedence than G0002?


See, the amount of assumption is much smaller for the S-exp notation.
We assume that the parentheses group and that space separates elements.
That's it. The infix means we assume.


So what happens with operator precedence is that when the number of meanings we
want to use exceeds the number of operators, we can't invent new operators, so
we start overloading the meanings of the existing ones: a + b adds
strings together, or performs set union, etc. In math is not so bad because in
math you can invent new glyphs, make use of different typefaces and alphabets,
and make use of two dimensions, etc. If you want some different kind of plus,
you can put a circle or box around the plus symbol and there you go: new glyph.


When prefix notations get long, we can easily break them into multiple lines
using a few simple guidelines, e.g.:


    (G0001 (G0002 G0003
                                (G0004 G0005 G0006))
                  G0007)


This we can easily visualize the structure as a tree printed sideways.


Consider that all Scheme code is written in that notation, not just small
expressions. The notation scales to express everything in the program.


The infix notation like a+b*c>7 is only /locally/ readable: small,
simple instances that fit onto less than about half a line of text.


It does not scale to large expressions, and it's not suitable for writing
expressing entire programs, which is why languages which have expressions
typically provide other constructs like statements and declarations for
structuring the rest of the program.


> Readability affects the programmer productivity.


That's only one kind of readability, which we can call ``micro-readability'':
the readability of a small expression that occupies about a third of a line of
text in your editor.


Microreadability is significant, but not as much as you think.


A large program is not readable no matter what notation it is written in. You
can't just sit down and read 500,000 lines of code, and grasp it all as a unit.
So being able to pick out a readable 15 character subsequence of that program
doesn't actually buy you as much as you think.


Suppose that small subexpressions found in a 500,000 line program are all
beautifully micro-readable. Suppose you need to make a small change to one of
them. What if it turns out that the program has 10,000 other expressions
similar to that one (but not exactly the same), and they /all/ have to be found
and changed in an analogous way in order for your proposed change to work
properly? Oops.


Large program structure and semantics is what affects productivity.


It's not how micro-readable it is, but how little of it you have to read,
understand and rewrite to implement a new requirement, or fix a bug.


> Since embedded language can be embedded in a computer language, such
> scheme and C++, the choice of prefix, infix and function-call can
> profound affect the readability of the embedded language. I haven't
> found any previous references on this issue. Could somebody
> recommend me some if there are?


If you don't have any references, how can you be sure that the effect of infix
versus prefix is ``profound''?


People working in, say, Java don't struggle any more or less than people
working in Scheme, in terms of just cranking out raw code and understanding
what they have written.


They struggle differently on a higher semantic level.
> 3+2*5>7
[ASSIGN 3 TO CONSTANT-THREE. ASSIGN 2 TO CONSTANT-TWO. ASSIGN 5 TO CONSTANT-FIVE.
MULTIPLY CONSTANT-TWO BY CONSTANT-FIVE GIVING INTERMEDIATE-FACTOR. ADD CONSTANT-THREE
TO INTERMEDIATE-FACTOR GIVING SUM-VALUE. IF SUM-VALUE IS GREATER THAN 7 GOTO ANOTHER-PLACE.


Now THAT's readable. -John]


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.