Re: Can Coco/R do multiple tokenizations

Chris F Clark <cfc@shell01.TheWorld.com>
21 Aug 2005 00:20:01 -0400

          From comp.compilers

Related articles
Can Coco/R do multiple tokenizations vardhanvarma@gmail.com (2005-08-13)
Re: Can Coco/R do multiple tokenizations gneuner2@comcast.net (George Neuner) (2005-08-16)
Re: Can Coco/R do multiple tokenizations DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-08-16)
Re: Can Coco/R do multiple tokenizations gene@abhost.us (Gene Wirchenko) (2005-08-16)
Re: Can Coco/R do multiple tokenizations cfc@shell01.TheWorld.com (Chris F Clark) (2005-08-21)
Re: Can Coco/R do multiple tokenizations darius@raincode.com (Darius Blasband) (2005-08-21)
RE: Can Coco/R do multiple tokenizations quinn-j@shaw.ca (Quinn Tyler Jackson) (2005-08-24)
| List of all articles for this month |

From: Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups: comp.compilers
Date: 21 Aug 2005 00:20:01 -0400
Organization: The World Public Access UNIX, Brookline, MA
References: 05-08-053 05-08-066
Keywords: lex
Posted-Date: 21 Aug 2005 00:20:01 EDT

Multiple tokenizations is "hard". I agree that Neta-S, I believe now
called GrammarForge, is your best bet for built-in support.


However, if you find a copy of SIGPLAN Notices, Decemember 1999, you
will find I wrote a column on how to work around lexers and parsers
that don't support it. (There may be a copy on the Compiler Resources
web site (see my .sig) of the article--it's a Latex file.) None of
the workarounds are exceptionally pretty, but they aren't rocket
science either. (The relevant movie quip is: "This isn't rocket
science. This is brain surgery.")


Of course, you should well consider the advice that what you are doing
is probably going to be hard on your users also. It may seem friendly
to allow users to omit whitespace and to include operator characters
within the language.


However, allowing both in one language is going to make certain
statements change meanings when "unrelated" things in the program are
modified. Your example is the perfect case. If one starts with a
program with only a and b declared, the fragment "a!=b" means one
thing. If some maintainer then adds a declaration of a!, the meaning
of the fragment has changed. Who will find that error and how? You
can probably write a parser with Meta-S that detects all such cases,
but it will not be easy, and will it really be a benefit.


In the end, you will probably find users adding in extra-whitespace
just to avoid the ambiguity. If the users are going to do that, why
not make the language (system) do it for them? For example, perhaps
you could define whitespace-free and whitespace-full forms and a tool
which creates the whitespace-full form from the whitespace-free
version, flagging errors when the conversion is unambiguous. That
would allow the user to dash-off whitespace-free versions when that is
convenient, but would have the whitespace-full form as a "reference"
version. (When you think about it, the tool should go both ways.
Tools that do that support "round-trip engineering" as they say in the
case-tool world.)


I have often thought something like that might make C++'s templates
easier to understand. I believe Eiffel had something like that,
perhaps dealing with opaque types, where one wants a full "reference"
version for some cases and an elided version for other uses. Another
variation on this theme is exemplified by the literate programming
work, where untanglers and weavers (if I have my nomenclature right)
are used to translate the text into a variety of forms.


Hope this helps,
-Chris


*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
------------------------------------------------------------------------------



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.