Re: Languages with optional spaces

Christopher F Clark <christopher.f.clark@compiler-resources.com>
Sat, 29 Feb 2020 11:48:41 +0200

          From comp.compilers

Related articles
[3 earlier articles]
Re: Languages with optional spaces maury.markowitz@gmail.com (Maury Markowitz) (2020-02-25)
Re: Languages with optional spaces maury.markowitz@gmail.com (Maury Markowitz) (2020-02-25)
Re: Languages with optional spaces martin@gkc.org.uk (Martin Ward) (2020-02-25)
Re: Languages with optional spaces 493-878-3164@kylheku.com (Kaz Kylheku) (2020-02-26)
Re: Languages with optional spaces awanderin@gmail.com (awanderin) (2020-02-26)
Re: Languages with optional spaces drikosev@gmail.com (Ev. Drikos) (2020-02-28)
Re: Languages with optional spaces christopher.f.clark@compiler-resources.com (Christopher F Clark) (2020-02-29)
Re: Languages with optional spaces drikosev@gmail.com (Ev. Drikos) (2020-02-29)
Re: Languages with optional spaces DrDiettrich1@netscape.net (Hans-Peter Diettrich) (2020-03-01)
Re: Languages with optional spaces christopher.f.clark@compiler-resources.com (Christopher F Clark) (2020-03-01)
Re: Languages with optional spaces drikosev@gmail.com (Ev. Drikos) (2020-03-01)
Re: Languages with optional spaces christopher.f.clark@compiler-resources.com (Christopher F Clark) (2020-03-02)
Re: Languages with optional spaces drikosev@gmail.com (Ev. Drikos) (2020-03-02)
[5 later articles]
| List of all articles for this month |
From: Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups: comp.compilers
Date: Sat, 29 Feb 2020 11:48:41 +0200
Organization: Compilers Central
References: 20-02-015 20-02-017
Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="16923"; mail-complaints-to="abuse@iecc.com"
Keywords: lex, history
Posted-Date: 29 Feb 2020 12:33:24 EST

"Ev. Drikos" <drikosev@gmail.com> posted an interesting albeit partial solution
to the problem of keywords being part of identifiers in languages with
optional spaces.
I won't include it here.


The problem is that some keywords can appear at places other than the
beginning of an identifier.
In fact, in the worst case scenario, the language can be ambiguous.
Consider the following "BASIC" program extended with variables that
are more than one letter long
and spaces being optional.


10 LET ITO = 1
20 LET I = 2
30 LET JTOK = 3
40 LET K = 4
50 FOR N = ITOJTOK
60 REM AMBIGUOUS FOR N = I TO JTOK
70 REM OR FOR N = ITOJ TO K
80 PRINT N;
90 NEXT N
100 END


The problem with such solutions is one is tempted to "fix" them one by
one as they are encountered.


Maury Markowitz <maury.markowitz@gmail.com> mentioned this in his post
where ATO was considered.
It could be A TO or AT O (presuming that TO and AT are both keywords)
Note that this is even an issue with 1 letter variable names if one
has both keywords.


As one starts patching up these cases, the "grammar"
(or its recursive descent implementation most likely)
begins to become what I call "ad hack".


With a GLR parser (or something equivalent in power, e.g. an Earley
parser or CYK) and a lexer that returns all possible sets of
tokenizations one can find all the relevant parse trees and then see
if only 1 makes semantic sense.


In the above example, that won't help as both interpretations are
legal programs.
One prints 2 3, the other 1 2 3 4.


I cannot imagine a programmer being happy with the error message:
LINE 50 AMBIGUOUS STATEMENT.


--
******************************************************************************
Chris Clark email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------
[I get the impression that more often than not, whoever wrote the interpreter
didn't give it much thought so the grammar is whatever the 6502 code did thirty
years ago. Fortran was ugly but at least it wasn't ambiguous and at each
point the lexer knew what tokens were valid. -John]



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.