Re: Layout syntax (Hans Aberg)
20 Dec 2003 11:48:45 -0500

          From comp.compilers

Related articles
Layout syntax (2003-12-03)
Re: Layout syntax (Joachim Durchholz) (2003-12-08)
Re: Layout syntax (2003-12-13)
Re: Layout syntax (Carl Cerecke) (2003-12-13)
Re: Layout syntax (Joachim Durchholz) (2003-12-14)
Re: Layout syntax (2003-12-20)
Re: Layout syntax (Joachim Durchholz) (2003-12-21)
Re: Layout syntax (2003-12-23)
Re: Layout syntax (Joachim Durchholz) (2003-12-27)
Re: Layout syntax (2004-01-02)
Re: Layout syntax (Joachim Durchholz) (2004-01-07)
Re: Layout syntax (2004-01-09)
[8 later articles]
| List of all articles for this month |

From: (Hans Aberg)
Newsgroups: comp.compilers
Date: 20 Dec 2003 11:48:45 -0500
Organization: Mathematics
References: 03-12-016 03-12-060 03-12-081 03-12-104
Keywords: syntax
Posted-Date: 20 Dec 2003 11:48:45 EST

<> wrote:

>>>This is probably because most mathematical objects don't have much of
>>>a nested structure.
>> Quite on the contrary, pure math has a lot of nested structures, even
>> though perhaps not so much in formulas.
>Oops - right, mathematical objects tend to be nested, and deeply so.

I am not sure how deeply nested they are: I want to make a system that (in
principle) is OK to the working mathematician. Part of the fascination
with working with it, is to figure out what this means in practise.

>>> That's also why I think that mathematical tradition isn't a source
>>>of inspiration for questions in this specific area.
>> I write on a proof-verification system, which is the reason it is a source
>> of inspiration to me. :-)
>I'm pretty sure that it's more semantic than syntactic inspiration.

Actually, there is a intricate interplay between syntax and semantics
(or notation and notions). In an early version, I merely added set
syntax to a standard Prolog engine, which gave some powerful
ideas. But unfortunately, that approach only leads to "naive set
theory". Thus, the next step is to build something around axiomatic
set theory, which in its turn requires a system that can correctly
handle metamathematics.

Then it turns out that it is very important to have a good syntax, as
one otherwise is quickly becoming lost as to what the correct
semantics is. It also turns out that many of the most popular
mathematical notations are pretty tricky to implement in using a
LALR(1) parser generator. For example, the Church (lambda x.f) is in
working math normally written x |-> f. Here, one cannot tell whether
"x" is a new variable or an old identifier before the "|->" has been
seen. So Haskell, in its syntax, throws in a backslash "\ x -> f", but
that is not practise in pure math.

It becomes even more complicated when introducing sets. For example,
  (1) {f(x)| A}
is the set of all f(x) such that the predicate A is true. But from the
general syntactic point of view, one can not tell whether (1) is short for
any of the following:
  (2) {y| exist x: y = f(x) and A}
  (3) {g| exist f: g = f and A}
  (4) {f(x)| A(f, x)}
as (1) may appear as a part of a larger formula where f and x may have
global meanings.

So one has to tell which of the names in (1) should be bound to the
functional set notation: x or f or both. Then the suggested notation
in a math book for this is (if say x is the one that is bound):

                {f(x)| A}_x

But in a computer language implementation, this then means to one does
not know what "x" is until the whole formula (1) has been read! This
is indeed a very bad situation if A would happen to be not just a
letter, but a big expression. So here I have introduced the notation

              {_x f(x)| A}

which still requires some lookahead, but less so.

So if I try to implement a syntax that easily parsable to me, then I
end up having to wrestle with such grammar implementation problems. In
the other variation, where one simply introduces a more computer like
syntax, then it quickly becomes very difficult for me to parse the
mathematics of the code, and so I cannot detect whether the
implementation is strictly mathematically rigorous.

>> Indentation errors are not only difficult for humans to detect,
>> they often generate incomprehensible compiler error messages as
>> well. One could of course argue this has with the compiler to do,
>> and not the layout syntax technique. But that was the state of the
>> art when I followed it.
>There's another question on the quality of problem messages on the
>mailing lists: were these consistently from newbies, or did it bite
>people with a degree of familiarity with indentation?

I have by now a very vague memory of it. But what I recall is that
people got strange error messages from Hugs that they could not
understand. Then it often turned out to be attributed to the layout

I cannot tell whether it was a weakness in Hugs or a problem with the
layout syntax. Somebody programming for a while would probably
learning that a certain weird error messages in reality depends on a
layout error.

>> I agree that if one should use an indentation system, then one
>> should probably avoid a tab-to-space conversion, or make it
>> explicit in the file.

>How would "making it explicit" work?

My idea was merely that one should be able to write "tabs = 4 spaces"
or something.

>I'm a bit sceptical that this would interact well with editors, but I
>might have overlooked something, so I'm interested in what you have
>in mind.

This certainly will screw up the look in any editors that cannot parse
it, but it will ensure that files are correctly handled when read by
the compiler, which is more important.

Also, layout syntax becomes even more complicated in Unicode, which has
spaces of different width, "EN SPACE", "EM SPACE", etc.

    Hans Aberg

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.