Re: Why context-free?

nmm1@cus.cam.ac.uk (Nick Maclaren)
13 Oct 2005 18:15:47 -0400

          From comp.compilers

Related articles
[7 earlier articles]
Re: Why context-free? gah@ugcs.caltech.edu (glen herrmannsfeldt) (2005-10-09)
Re: Why context-free? mpah@thegreen.co.uk (2005-10-09)
Re: Why context-free? nmm1@cus.cam.ac.uk (2005-10-09)
Re: Why context-free? rfigura@erbse.azagtoth.de (Robert Figura) (2005-10-10)
Re: Why context-free? boldyrev@cgitftp.uiggm.nsc.ru (Ivan Boldyrev) (2005-10-10)
Re: Why context-free? dot@dotat.at (Tony Finch) (2005-10-13)
Re: Why context-free? nmm1@cus.cam.ac.uk (2005-10-13)
Re: Why context-free? cfc@world.std.com (Chris F Clark) (2005-10-13)
Re: Why context-free? neelk@cs.cmu.edu (Neelakantan Krishnaswami) (2005-10-13)
Re: Why context-free? darius@raincode.com (Darius Blasband) (2005-10-13)
Re: Why context-free? anton@mips.complang.tuwien.ac.at (2005-10-14)
Re: Why context-free? darius@raincode.com (Darius Blasband) (2005-10-19)
Re: Why context-free? nmm1@cus.cam.ac.uk (2005-10-19)
[19 later articles]
| List of all articles for this month |

From: nmm1@cus.cam.ac.uk (Nick Maclaren)
Newsgroups: comp.compilers
Date: 13 Oct 2005 18:15:47 -0400
Organization: University of Cambridge, England
References: 05-10-053 05-10-061 05-10-062 05-10-068
Keywords: parse, design
Posted-Date: 13 Oct 2005 18:15:40 EDT

glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
>John wrote:
> > [
>> diagnostics and general readability. When you start building types
>> into the syntax, that means that many type errors now are likely to
>> produce "syntax error" rather than "string found where boolean
>> expected", and I have to say your 2 vs. 3 way branch is grosser than
>> anything I've done in perl. As far as extending the syntax on the
>> fly, that avenue was extensively investigated in the 1970s in
>> languages like IMP-72 and EL/1, all of which died.
>
>Two languages that I have used recently that allow syntax changes, TeX
>and mathematica, don't seem to be going away so fast.


Not to say Lisp :-)


>There are some very strange errors that you can get in both languages.


Yes, indeed. TeX, in particular, is ghastly - but that has little
to do with its extensibility.


But there are extensions and extensions and, while I agree that too
much flexibility leads to confusion, there are examples of controlled
extensibility with good diagnostics. Consider the Phoenix command
language (a Cambridge phenomenon) or Genstat. Both allow the addition
of user-defined statements - as do a fair number of other "higher
level" languages. The point with those is that any syntax error is
localised by the fixed language.


Even when you get more general (e.g. Algol 68 and languages with
type-dependent expressions), restricting them only slightly allows
for diagnostics like:


        In the context of a <numbat> statement, there is no expression
        that matches 'AARDVARK <integer array expression> : ...'.


While that doesn't tell you if you meant to type AARDWOLF, you forgot
to subscript the array or you typed ':' meaning ';', such diagnostics
are better that most you get from C! I can't remember which language
I used produced ones very like that.




Back to compilation techniques. My observation is that the keys to
good diagnostics are:


        Bounded look-ahead at ALL levels - Algol 68 got it wrong with
modes and operators, C99 has got it seriously wrong with inline
(and, historically static/extern), and Fortran got it wrong with
external functions.


        All context must be set at a serialisation point before it is
used, not just by the time it is needed. Many languages get this
wrong, though not usually seriously. Without this, diagnostics often
have to refer to the internal names of types, which is very confusing.


        Any extensions must be controlled to balance flexibility and
recoverability. In particular, allowing an extension to mean that
a user error creates an ambiguity in the nature of a syntactic
object is a bad idea. Well, the same applies to built-in syntax.


If you have those three techniques, a compiler can always decide
where the lexically next error occurs, and can avoid the diagnostics
that say "Somewhere in the program, there was a syntax error." It
can also produce messages that link the erroneous area back to the
definitions it is using at that point.


Note that I am NOT saying this is possible in a compiler generated
using the advanced methods, though it may be. But it is certainly
possible in a single-pass, recursive descent compiler. Without
those restrictions, good diagnostics are often impossible using any
method at all.




Regards,
Nick Maclaren.


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.