Simple to implement and to use

Christopher F Clark <christopher.f.clark@compiler-resources.com>
Sun, 11 Dec 2022 19:41:50 +0200

          From comp.compilers

Related articles
What attributes of a programming language simplify its use? gah4@u.washington.edu (gah4) (2022-12-01)
Re: What attributes of a programming language simplify its use? tkoenig@netcologne.de (Thomas Koenig) (2022-12-03)
Re: What attributes of a programming language simplify its use? DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2022-12-03)
Simple to implement and to use christopher.f.clark@compiler-resources.com (Christopher F Clark) (2022-12-11)
| List of all articles for this month |

From: Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups: comp.compilers
Date: Sun, 11 Dec 2022 19:41:50 +0200
Organization: Compilers Central
References: 22-12-001 22-12-003 22-12-004 22-12-019
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="57925"; mail-complaints-to="abuse@iecc.com"
Keywords: types
Posted-Date: 11 Dec 2022 13:16:17 EST

Trying to bring this back to compilers (and their implementation).
Over the years, I have noticed a couple of things in the various
languages I have used.
These are just my opinions and observations.


----------


LL(1) parsing is good for statements. And a good rule of thumb is
that every statement (except perhaps 1) should start with a keyword.
The except perhaps 1 case is often assignment statements where you
have lhs-expression assign-op rhs-expression. But, once you have that
no other statement should start with an expression (without a
keyword). You can make the keywords reserved in that context without
undue burden to the user. PL/I's "decl" statement and Pascal's "var",
"function", "procedure", etc statements are good examples of this.


Curiously if you want a series of keywords to begin a statement, you
should make the "reserved" keyword be last in the list or have
something else that separates the list of keywords from the normal
identifiers. In Yacc++ we have a variety of declarations that define
tokens that are keywords, the reserved word for those declaration is
"keyword" but we have a bunch of other words that aren't reserved that
can modify keyword. Those words all must appear before keyword in the
declaration. That way you can distinguish them from usage as
identifiers. Doing that is easier with an LR grammar.


e.g.


case sensitive substring keyword keyword /* that keyword is an
identifier */, case /* so is case */, substring /* and substring */;


The first 4 words in the above declaration are all keywords, but then
after the special keyword "keyword" those simply become identifiers,
and the LR grammar has no issues telling those apart.


An alternative formation might look like this:


keyword keyword, case, sensitive : case sensitive substring;


The colon (a reserved token) separates the modifying keywords from the
list of identifiers.


Note if I were doing a language like Pascal I might do it like:
("var"|"const") identifier (("," identifier)* (":" type-expr)? ("="
init-expr)? ("@" locatiion-expr)?)+ ";"


Then in a type-expr, keywords like "int" and "float" become reserved,
but not elsewhere.
And after the at words like "static", or "heap" or "stack" would be reserved.


---------


Languages with balance "parenthesizing" keywords are generally less
ambiguous. if expr then stmt (else stmt)? fi where the if and fi
match gets rid of dangling else problems and variations like if expr
then stmt (else if expr then stmt)* (else stmt)? fi still don't have
an issue. Note that in this case, you probably want "then" and "else"
to be reserved words in your grammar or do something if "(" expr ")
stmt (";" stmt)? fi // where ";" is a clear reserved token or if "("
expr ")" "{" stmt "}" ("{" stmt "}")? fi where the parens and braces
balance also works.


Curiously, from C I learned that single character parentheses have
their advantages. Thus () [] {}, but not really << >> or even
"begin" and "end". However, the convention of ''' (3 of the relevant
quote/paren) for multi-line bracketed items does seem to work well.
And, 3 for that is better than 2. Backslash conventions may be a
necessary evil, but they are not very friendly. Quoted strings where
the same quote starts and ends the string also tend to be error prone,
but they are so much a part of the heritage that it is another
necessary evil.


In fact, the worst part of error detection and recovery from my
experience is "single character" errors that radically change the
program. It is too easy for a single character to get inserted and
break the program in a way that is easy to overlook.


-------


Another thing which works poorly is having both prefix and suffix
operators. If you have them, they should not be at the same level of
precedence, that almost always results in ambiguity.


-------


--
******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.