Re: Languages that are hard to parse

Hans-Peter Diettrich <DrDiettrich@compuserve.de>
22 May 2005 00:58:46 -0400

From comp.compilers

Related articles
[4 earlier articles]
Re: Languages that are hard to parse Peter_Flass@Yahoo.com (Peter Flass) (2005-05-19)
Re: Languages that are hard to parse gah@ugcs.caltech.edu (glen herrmannsfeldt) (2005-05-20)
Re: Languages that are hard to parse DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-05-20)
Re: Languages that are hard to parse henry@spsystems.net (2005-05-21)
Re: Languages that are hard to parse gah@ugcs.caltech.edu (glen herrmannsfeldt) (2005-05-22)
Re: Languages that are hard to parse Satyam@satyam.com.ar (Satyam) (2005-05-22)
*Re: Languages that are hard to parse DrDiettrich@compuserve.de (Hans-Peter Diettrich)* (2005-05-22)**
Re: Languages that are hard to parse dot@dotat.at (Tony Finch) (2005-05-24)
Re: Languages that are hard to parse wclodius@lanl.gov (2005-05-24)
Re: Languages that are hard to parse Martin.Ward@durham.ac.uk (Martin Ward) (2005-05-24)
Re: Languages that are hard to parse ralph@inputplus.co.uk (2005-05-26)
Re: Languages that are hard to parse hannah@schlund.de (2005-06-02)
Re: Languages that are hard to parse zvr@pobox.com (Alexios Zavras) (2005-06-02)
[1 later articles]

| List of all articles for this month |

From:	Hans-Peter Diettrich <DrDiettrich@compuserve.de>
Newsgroups:	comp.compilers
Date:	22 May 2005 00:58:46 -0400
Organization:	Compilers Central
References:	05-05-119 05-05-155 05-05-166 05-05-182 05-05-192
Keywords:	Cobol, parse
Posted-Date:	22 May 2005 00:58:46 EDT

Henry Spencer wrote:
>
> Our moderator writes:
> >[...The reason that PL/I doesn't have
> >reserved words is that COBOL has a huge list, so that programmers either
> >need to keep a chart of them on the office wall to consult every time they
> >invent a new name, or be sure every name includes a hyphen or digit to
> >be sure it doesn't collide with one. -John]
>
> Actually, it's worse than that. The usual approach is to keep a chart on
> the wall of the *keywords* that have hyphens in them -- there are some --
> and always put at least one hyphen in your names. The hyphenated-keywords
> chart is a lot more manageable than the full keywords chart.

Okay, I didn't realize the problems resulting from a huge number of
keywords. OTOH languages with few keywords, like C, can come with a
huge library, containing "predefined" words. Is it really easier to
remember the symbols defined in every #included header file, and if so
- why?

Perhaps the keyword issue only is related to a specific lexer/parser
philosophy? I'm no more familiar with COBOL, but in other programming
languages many keywords have a special meaning only in specific
context. In a lex/yacc grammar for ObjectPascal (OPL) I ran into
trouble with such words ("of", "index"...). A scannereless parser will
have no problems with such constructs, but a dumb lexer and a
bottom-up parser has little chance to separate identifiers from
keywords, due to the lack of context information. A recursive descent
parser instead will know where a word has a special meaning.

From the technical viewpoint it's feasable to implement both reserved
and restricted words, without special runtime penalties. The
restricted words can be treated as ordinary identifiers, and the
symbol table can be initialized with a list of these words. Then the
parser can quickly verify the occurence of an expected keyword, based
on the table position, ID, or whatsoever, of an identifier
token. Similar techniques already may be used in C parsers, in order
to distinguish typedef names from other identifiers.

> (In my last undergrad year...

Now I also realize a problem of English speakers. In one of my first
courses I've been advised to use natural (German) names for all
identifiers, to prevent name clashes with reserved words. This may be
one of the reasons why English speakers prefer the cryptic C notation,
because the non-alphabetic characters cannot collide with alphabetic
names. APL tried to solve the collision problem with a special
character set, where the reserved words are distinct characters and
glyphs. In the APL model the coder only has to remember the glyphs
which he really uses, and the related keystrokes, of course.

Now I wonder what Chinese people would think about a programming
language, be high or machine level, with special glyphs for all
instructions and keywords. Perhaps they would find this approach
perfectly natural, and easy to parse at the same time...

DoDi
[I think that C's library names are easier to deal with than COBOL reserved
words because you only need to be aware of the names in the libraries you use,
while in COBOL you have to avoid reserved words even from parts of the language
that you don't know and never use. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Languages that are hard to parse

Hans-Peter Diettrich <DrDiettrich@compuserve.de>22 May 2005 00:58:46 -0400

Hans-Peter Diettrich <DrDiettrich@compuserve.de>
22 May 2005 00:58:46 -0400