Re: Formally Defining a Programming Language

federation2005@netzero.com
Wed, 29 Feb 2012 17:11:11 -0800 (PST)

          From comp.compilers

Related articles
Formally Defining a Programming Language seimarao@gmail.com (Seima Rao) (2011-11-19)
Re: Formally Defining a Programming Language kaz@kylheku.com (Kaz Kylheku) (2011-11-21)
Re: Formally Defining a Programming Language christophe@taodyne.com (Christophe de Dinechin) (2011-11-22)
Re: Formally Defining a Programming Language s_dubrovich@yahoo.com (s_dubrovich@yahoo.com) (2011-11-27)
Re: Formally Defining a Programming Language federation2005@netzero.com (2012-02-29)
Re: Formally Defining a Programming Language gah@ugcs.caltech.edu (glen herrmannsfeldt) (2012-03-02)
| List of all articles for this month |

From: federation2005@netzero.com
Newsgroups: comp.compilers
Date: Wed, 29 Feb 2012 17:11:11 -0800 (PST)
Organization: Compilers Central
References: 11-11-039
Keywords: parse, theory
Posted-Date: 02 Mar 2012 16:12:40 EST

On Saturday, November 19, 2011 7:45:53 AM UTC-6, Seima Rao wrote:
> Can readers of this forum help direct to relevant materials wrt
> Formalism that I can study to learn about Formalisms that will help in
> deciding about my Programming Language?


To expand on a reply given by Kaz Kylheku: it's a dirty little secret that the
front ends of these languages are being designed by a process that amounts to
little more than wading in the dark -- except the part about it being a
secret.


It seems that a lot of the COBOL mind-set got caught up in the revisions that
went into making C++, C# (not to mention languages like SQL). This mind set
basically amounts to building the constraints directly into the syntax,
turning simplicity into a highly redundant convoluted affair. Take a look at
the ECML spec for C#, for instance.


http://www.ecma-international.org/publications/standards/Ecma-334.htm


You will see the same items appearing in similar-looking phrase
structure rules in a half-dozen different places. What the language
designer is doing is basically forcing the syntax to encapsulate
agreement rules or semantic constraints -- which is the First Cardinal
Sin of designing language front ends. The result is that the grammar
comes off looking more like COBOL (or even the original form of
Pascal, to some degree).


This seems to be starting to take root in the latest revision of C.
Though C is a relatively clean language, in terms of syntax, there
already were several places -- before the 201X revision -- where
constrains found their way into the syntax: the ordinary vs. abstract
declarators, two sets of rules for type specifiers as well, a mangling
of the syntax for cast-expressions, of the assignment operator (in
that case, semantic constraints for the assignment statement were
forced into the syntax by a weird doling out of expression priority
levels).


But now we come to 2010-2011, and we find that (last I checked) the
committee who does the ISO standards complete mangled the syntax for
structure expressions, basically duplicating it over, messing up the
syntax for cast-expressions, and forcing semantic constraints for
structured expressions into the syntax itself (e.g. that they can only
be type-cast). Those are thing you normally either (a) design around
by generalizing the language to allow for fewer restrictions, (b)
explicitly stipulating the constraint in the semantics section and
keeping it out of the syntax or (c) a bit of both.


As far as a language like C# or C++ goes (not to mention SQL!): I've seriously
thought about bringing a linguist(!) into the loop to work with me to analyze
the languages actually defined in the Standards to come up with a better
account of what the language is (count me as one such person, since I have
background in Linguistics, but there's a couple others I have in mind).


If I have time, I'm going to upload a simplified (but still basically
equivalent) account of the syntax for C -- 1989/1990, 1999 AND 2010 all in one
file. As to the larger questions you're asking (the REAL question BTW is what
kind of SEMANTIC formalism to use for the languages) -- I may follow up on the
syntax by showing a nice way to parse it, to derive a parser for it, that goes
beyond the traditional LR and LL parsing formalisms (i.e. algebraic methods
that use a calculus for context-free expressions).


I think you may find an article from a while back in the comp.compilers
archive where I gave a somewhat detailed account of my design of the language
C-BC (which is POSIX BC almost upgraded to C!) -- its execution model. A
similar approach is adopted by a language like Prolog, where an execution
model for it has been posed (the Warren Abstract Machine, WAM).


Now .. for me to try the same exercise of making a simplified syntax
for C++ (and externalizing its constraints) ... that's a much more
difficult and lengthy exercise that I've only begun considering. But
make no mistake: those phrase structure rules have got to come down in
number and complexity!


An enveloping grammar for C++ or even SQL ... too much to hope for?


One of the advantages of externalizing constraints, BTW, is that it puts the
spotlight not only on the "Why even have the constraint?" question but also on
the "why not just remove it?" question. This leads to cleaner languages.
People, after all, have to USE these languages and LEARN them! That's why
you're not supposed to clutter the grammar.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.