Re: problems with identifiers and keywords...

Chris F Clark <cfc@shell01.TheWorld.com>
14 Nov 2004 22:39:14 -0500

From comp.compilers

Related articles
[3 earlier articles]
Re: problems with identifiers and keywords... clint@0lsen.net (Clint Olsen) (2004-10-25)
Re: problems with identifiers and keywords... cfc@shell01.TheWorld.com (Chris F Clark) (2004-11-02)
Re: problems with identifiers and keywords... gah@ugcs.caltech.edu (glen herrmannsfeldt) (2004-11-06)
Re: problems with identifiers and keywords... wclodius@lanl.gov (2004-11-06)
Re: problems with identifiers and keywords... wyrmwif@tsoft.org (SM Ryan) (2004-11-07)
Re: problems with identifiers and keywords... vbdis@aol.com (2004-11-07)
*Re: problems with identifiers and keywords... cfc@shell01.TheWorld.com (Chris F Clark)* (2004-11-14)**
Re: problems with identifiers and keywords... genew@mail.ocis.net (Gene Wirchenko) (2004-11-14)
Re: problems with identifiers and keywords... gah@ugcs.caltech.edu (glen herrmannsfeldt) (2004-11-17)
Re: problems with identifiers and keywords... gah@ugcs.caltech.edu (glen herrmannsfeldt) (2004-11-17)
Re: problems with identifiers and keywords... lkrupp@pssw.NOSPAM.com.INVALID (Louis Krupp) (2004-11-17)
Re: problems with identifiers and keywords... cfc@shell01.TheWorld.com (Chris F Clark) (2004-11-17)
Re: problems with identifiers and keywords... nmm1@cus.cam.ac.uk (2004-11-19)
[11 later articles]

| List of all articles for this month |

From:	Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups:	comp.compilers
Date:	14 Nov 2004 22:39:14 -0500
Organization:	The World Public Access UNIX, Brookline, MA
References:	04-10-148 04-10-170 04-10-174 04-11-008 04-11-011
Keywords:	parse, design
Posted-Date:	14 Nov 2004 22:39:14 EST

Glen Herrmannsfeldt replying to me wrote:
> Well, I hope I didn't snip too much.

I don't think one can snip "too much" of my posts ;-)

> A question one could ask is: should a language be designed to be
> easy for compilers or easy for people. Unless I am writing the
> compiler, I would say it should be easy for people.

In any case, I think we are in "violent agreement" on this topic.

My point was that if the language is hard to write a parser for, it is
probably hard for humans to parse too. That doesn't mean that there
aren't techniques which don't fit well with current lexer/parser
generation technology that are not easy to parse (and easy to
understand) <more on this in a second>. It's just that if you can't
*easily* write a mechanical way to translate it, then a human
problably isn't going to be able to understand it easier.

More importantly, I think that there are also plenty of "what seem to
be easy" to follow "hacks" that seem easy to understand that are
actually disasters in real-life. That was the point of the rambling
in my previous post in the thread. Just because you have a clever
idea that allows something to be expressed tersely (and you can come
up with a clever way of "parsing" your idea), doesn't necesarily mean
that that is a good notation.

For example, SGML put markup in <> delimiters (whence HTML and now XML
does the same). However, SGML recognized that < and > might be useful
in text and allowed one to use some notation I forget to change the
delimiters. Most current lexer/parser generators cannot deal with
that level of dynamicism. However, it is trivial to parse with a
simple program where one keeps the delimiters in a string variable
and scans for the "next one". The dynamicism there is no problem.
So, is that a bad parsing idea? I'm not certain--my guess is probably
not. It seems like from the human perspective, one could easily
figure out what is going on.

Many of the HTML tags make up matched pairs, e.g. <table> </table>.
Matched pair processing is definitely the domain of parser generators.
However, many processors have been implemented ad hoc and are less
than dilligent in being well-defined in terms of what happens when the
pairs don't match. Is that a bad parsing idea? Again, one could
argue not, because behaving robustly (i.e. not crashing), is a very
useful property. However, my opinion is that robustness goes beyond
simply not crashing. It seems unlikely that a human is going to make
sense of poorly matched pairs in general (people have invented
indentation schemes just to hlep keep the pairs matched). Allowing
them without warning seems like a generally bad practice.

Going back to positive cases, the "length prefix" notation, i.e. a
number followed by that many "characters" of data is something else
that most current lexser/parser generators don't do well on. Again,
there is a trivial program to recognize the data and it not
necessarily a bad format. It isn't one I would pick for "human"
display, but for interchange between two programs where a human might
only occassionally intervene, it could be a good choice.

Finally, as to PL/I and keywords that can be identifiers. I think
that is not the worst feature a language could have, especially if one
decides to take on a big semantic space and as a result has many
keywords that one might not remove from every users namespace. The
keyword as identifier feature was not the stumbling block to writing
easy to maintain and error free PL/I programs (implicit conversions
were the silently do the wrong thing issue).

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: problems with identifiers and keywords...

Chris F Clark <cfc@shell01.TheWorld.com>14 Nov 2004 22:39:14 -0500

Chris F Clark <cfc@shell01.TheWorld.com>
14 Nov 2004 22:39:14 -0500