Re: Looking for volunteers for XL

Christophe de Dinechin <christophe@taodyne.com>
Mon, 28 Nov 2011 14:12:42 -0800 (PST)

          From comp.compilers

Related articles
[4 earlier articles]
Re: Looking for volunteers for XL christophe@taodyne.com (Christophe de Dinechin) (2011-11-27)
Re: Looking for volunteers for XL bc@freeuk.com (BartC) (2011-11-27)
Re: Looking for volunteers for XL kaz@kylheku.com (Kaz Kylheku) (2011-11-28)
Re: Looking for volunteers for XL tdk@thelbane.com (Timothy Knox) (2011-11-27)
Re: Looking for volunteers for XL bc@freeuk.com (BartC) (2011-11-28)
Re: Looking for volunteers for XL gah@ugcs.caltech.edu (glen herrmannsfeldt) (2011-11-28)
Re: Looking for volunteers for XL christophe@taodyne.com (Christophe de Dinechin) (2011-11-28)
Re: Looking for volunteers for XL gah@ugcs.caltech.edu (glen herrmannsfeldt) (2011-11-29)
Re: Looking for volunteers for XL jussi.santti@ard.fi (ardjussi) (2011-11-30)
Re: Looking for volunteers for XL kaz@kylheku.com (Kaz Kylheku) (2011-12-01)
Re: Looking for volunteers for XL kaz@kylheku.com (Kaz Kylheku) (2011-12-01)
Re: Looking for volunteers for XL blog@rivadpm.com (Alex McDonald) (2011-12-01)
Re: overloading, was Looking for volunteers for XL gah@ugcs.caltech.edu (glen herrmannsfeldt) (2011-12-02)
[2 later articles]
| List of all articles for this month |

From: Christophe de Dinechin <christophe@taodyne.com>
Newsgroups: comp.compilers
Date: Mon, 28 Nov 2011 14:12:42 -0800 (PST)
Organization: Compilers Central
References: 11-11-048 11-11-053 11-11-054 11-11-058 11-11-060
Keywords: syntax, design
Posted-Date: 29 Nov 2011 01:54:07 EST

Pre-scriptum: It may be better to move this dicussion to xlr-talk.




On Nov 27, 11:24 pm, "BartC" <b...@freeuk.com> wrote:
> But at least it is usually obvious what is and what isn't a function
> call; the name of the function should give a clue as to what it does,
> and sometimes the module where it lives is provided also, useful extra
> information.


Right. And similarly, the form of a notation in an extensible language
should give a clue at to what it does. Here is an actual use of XL,
I hope that you understand what it does without looking at the
documentation:


        slide "A good talk needs fancy titles!",
                * "This is my first point"
                * "This is my second point"


Here is another language extension from the built-in library:


        if X < 3 then writeln "X is small" else writeln "X is big"


It's actually defined that way in XLR (in a pre-loaded file):


        if true then X else Y -> X
        if false then X else Y -> Y




Now, you are right that function calls are easily identified in
languages such as C. Similarly, in XL, tree rewrites are easily
identified, since everything is a tree rewrite but for a few
exceptions. The only things the compiler treats specially are:


1) Sequence of instructions, i.e. infix semi-colon or new-line, e.g.


        write "Hello"; write "World"
        writeln "."


2) The rewrite and "don't rewrite" operators, -> and data. "Foo->Bar"
means "rewrite Foo as Bar",


        0! -> 1
        N! -> N * (N-1)!


  "data X" means "don't rewrite something that looks like X"


        // This means that evaluation stops at commas,
        /// i.e. 1,3,4+5 will evaluate as 1,3,9
        data X,Y


3) Type declarations, e.g "X:integer". This means "X has the shape of
an
integer". I'm still struggling with a decent notation for "X has the
shape of an if-then-else", ideas are welcome.


4) Terminating, machine-code-generating notations. Currently, there
are three.


    4a) "opcode" for machine-level instructions


        (X:integer+Y:integer):integer -> opcode Add
        (X:real+Y:real):real -> opcode FAdd


    4b) "C" for declarations of C bindings


        allocate_bytes N:integer -> C malloc


    4c) a special "extern" notation to directly import C declarations.
              (actually lowered to a C declaration internally).


        extern double sin(double);


    These three notations only work when running the compiler at -O3 for
    obscure reasons, don't ask ;-)


5) Blocks can be eliminated in many contexts, e.g. "(1+X)" can
evaluate as "1+X" except in a few special corner cases.




Anything else is a tree rewrite. When you write:


      if X < 3 then writeln "Small"


the following happens:


1) The source text is parsed in a standard parse tree format that
consists of 8 node types (integer, real, text, name/symbol, infix,
prefix, postfix, block), where separate lines are really an infix \n
and indentation is really a block with special open/close delimiters.


2) Forms are looked up based on their shape alone. It finds "if true
then X -> X" and tries to match it. Matching requires testing if "X<3"
can
match "true", hence requires evaluation of X<3. So we recursively
use the same principle to evaluate X<3.


3) Assuming X evaluates to an integer, for "X<3", we end up with an
"opcode", so we generate machine code for a less-than operator that
returns the trees "true" or "false" (at least logically, internally it
may be a machine binary flag at -O3).




> And everyone understands about functions and knows the simple syntax
> involved. Looking at sequence of unknown functions for the first time
> won't faze anyone; but how about a long sequence of newly-invented
> statements!


In my opinion, this really depends on the notations you
invent. Looking at a sequence of functions can faze anybody if the
names are poorly chosen or don't match your expertise.




> When extensibility is applied to syntax itself (for example), then
> what will it look like?


XL does not encourage syntax changes. Most notations should
fit within the standard syntax framework (i.e. the 8 node types
described above). You can change the syntax (see "extern" above) but
apart from connecting to a different language, I have yet to find a
case where it's required.


With the base syntax, you can easily add operators and describe their
precedence. But changing precedence is rarely useful, and like other
syntax changes, it's actively discouraged. For example, there's no
sane
way to make A+B*C parse differently than A added to (B times C). But
naturally, nothing prevents you from declaring:


        A+B*C -> A-B/C


just like nothing prevents you from writing stupid C code like the
following (and live with the consequences):


        double cos(int x) { return sin(x); }


It then becomes a quality-of-implementation issue to detect such
redefinitons of key staples of the language. A good C compiler may
warn about the declaration above. A guess a future good XL compiler
could detect specific redefinitions as probably harfmul.




> How will someone even know this is a new syntax?


Well, there is a way to change the syntax tables, which looks like
this:


    syntax
            POSTFIX 300 oranges apples


Here is how you can then used the added potfix "oranges" and "apples":


    N apples -> 100 * N
    N oranges -> 25 * N
    if 25 apples > 30 oranges then
            writeln "Good"
    else
            writeln "Uh oh?"


> Will there be keywords involved so that it can be looked up
> somewhere, or is it just symbols?


For XL, except for the cases listed above, there aren't any keywords.




> Can existing keywords be overloaded?


If you mean: can I write something that changes the meaning of "A+B"
or "if A then B", then the answer is yes. Actually, there's nothing
special about these forms, they go through the same rewrite process as
anything else.




> Can the same syntax mean something different depending on where it's used?


If we understand "syntax" by extension of what is done in hard-coded
languages such as C, then syntax can change all the time in an XL
source
file: you can change the meaning of "if A then B" locally just like
you can create function overloads in C++.


But the XL view is that the these are really semantics changes, not
syntax changes. The actual syntax is fixed, and you are supposed to
fit in the standard 8-nodes, to produce a standard 8-node syntax
tree. Compilation happens solely based on the shape of these standard
parse trees.




> Creating new functions is a bit like using jargon in English; you just
> look the words up in a glossary. But new syntax is more like new
> grammar.


If I google something or verb a noun or noun a verb, I change rulz
all the time, mon ami, and you _often_ s33 a # of valid rsns for
d-o-i-n-g so even in p.o. English :-) ^^ LOL.


The human brain is flexible.




> The other kind of extensibility I know about is operator overloading,
> where the problems are well-known; the expression A+B could
> conceivably mean anything, instead of being restricted to a small set
> of predefined types.


This argument was made a long time ago by Bertrand Meyer in a very
articulate article, see
http://se.ethz.ch/~meyer/publications/joop/overloading.pdf.


While the article is very well written, I totally disagree with Meyer
on this (even though I respect him a lot as a language designer). Why?
Because I can't even think of using a programming language where I
need to write "1 + 2" for integer addition, and something different
like "1.3 +. 4.2" for floating-point addition. There are languages
like this (Ocaml for this specific case). I think this is a real
nuisance.


One guiding principle in concept programming is that the notation
should match the "conceptual" notation. I call "syntactic noise" what
happens when you stray from that ideal. If you are used to saying "I
add two numbers" and "I add a chair to this room", then your compiler
should be able to deal with this as well and accept the word "add" in
both cases. So having to write "+." instead of "+" just because the
arguments are real numbers is syntactic noise.




> I never heard that, but the Basic guys were wrong. They would know
> about subroutines, and could not really object to giving them proper
> identifiers and having named parameters. And would know there could
> potentially be thousands of the things in a large program.


Yes, they were wrong, but that seems obvious only with the experience
gained since then with languages supporting arbitrary named
procedures and functions.


The arguments at the time were not stupid at all, and made by smart
people, just like the arguments made here to suggest that extensible
languages are a bad idea. For example, linkers at the time often
limited symbol sizes to 6 or 8 characters, so it was reasonable to
constrain the use of that precious and limited resource.


As for Basic programs with thousands of things in them, that was
somewhat infrequent ;-)




> But surely there is an upper limit to how many kinds of statements are
> viable in a single language?


I don't think so. On the contrary, I believe there is value in making
DSLs small, localized, surgical.


Here is a recent example for a presentation written in XL, where I
added a notation that says: "between X and Y on that movie, do this"
as follows:


      [Begin:real..End:real] Body ->
              if movie_time CurrentMovie in Begin..End then
                    Body


I then use the notation as follows:


      [0.5 .. 2.5]
              text_caption "We see a giraffe here"


      [2.6 .. 5.2]
              show_3D_object "RotatingLogo.obj"


I personally find it way more readable than "expanding the macro" as
someone else on the list described it. And the notation is defined
just above its use, so it's not unreadable to someone not familiar
with the notation either.




> Well, I find Lisp unreadable anyway, without even knowing whether it is
> over-using extensibility or not.


So do I. But it's still a really good language. That's why XL borrowed
many ideas from Lisp, including having an homo-iconic parse tree with
a very simple structure.




> A self-extensible language sounds like a good idea and might well
> work. I admit I've never used one (although I did play with designing
> one once, then gave up), and have no idea what is and isn't possible;
> could you create a language that has C syntax for example, then add in
> a few Cobol-like statements or APL expressions? Or is the syntax it's
> capable of rather more limited?


You could, but the intent is the other way round: can you make it so
that features that are natural in another language fit in a common,
shared notational framework.


For XL, I started with features that are common to all languages such
as integer addition, if-then-else, and so on. If you can't do that,
the approach is essentially flawed. XL now passes that test to a large
extent in both its functional and imperative variant.


But of course the objective is to go beyond the basics, i.e. to be
able to import the best features of other languages in a way that
blends in. For example, Ada has a rendez-vous based tasking. Can I
blend that in XL? I don't really want the precise Ada syntax, just a
way to express the Ada concept with a clear and concise notation. Or
regular expressions. Or APL array processing. Or Erlang-style message
passing. Or fixed-point combinators. Or user-controlled garbage
collection
(ideally in a form that also applies to files or network sockets).


And quite frankly, XL fails most of these tests in its current
incarnations. This is why I'm asking for help.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.