Re: Can Coco/R do multiple tokenizations

Hans-Peter Diettrich <DrDiettrich@compuserve.de>
16 Aug 2005 11:17:35 -0400

          From comp.compilers

Related articles
Can Coco/R do multiple tokenizations vardhanvarma@gmail.com (2005-08-13)
Re: Can Coco/R do multiple tokenizations gneuner2@comcast.net (George Neuner) (2005-08-16)
Re: Can Coco/R do multiple tokenizations DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-08-16)
Re: Can Coco/R do multiple tokenizations gene@abhost.us (Gene Wirchenko) (2005-08-16)
Re: Can Coco/R do multiple tokenizations cfc@shell01.TheWorld.com (Chris F Clark) (2005-08-21)
Re: Can Coco/R do multiple tokenizations darius@raincode.com (Darius Blasband) (2005-08-21)
| List of all articles for this month |
From: Hans-Peter Diettrich <DrDiettrich@compuserve.de>
Newsgroups: comp.compilers
Date: 16 Aug 2005 11:17:35 -0400
Organization: Compilers Central
References: 05-08-053
Keywords: lex
Posted-Date: 16 Aug 2005 11:17:35 EDT

vardhanvarma@gmail.com wrote:


> Consider a langauage, which allows ! and = in its identifiers.
> Of course usual C operators like !,= etc are also allowed.


Do you realize that your grammar is ambiguous at the *lexer* level?


> Consder this string (note no whitespaces ):
> 'a!=b'
> Valid tokenization/parsing can yield several posibbilityes
> 1. 'a!=b' .. a single token.
> 2. 'a!' '=' 'b' .. an assignment
> 3. 'a' '!=' 'b' .. an comparision.


Most lexers will return the longest match, i.e. (1).


> The accpetenace of a particular choice is influenced by
> 1. has a token already been defined (if 'a' and 'b' have been
> defined, than (3) gets priority )
> 2. what are some tokens following this or preceding this string.
> .. if preceded by 'z =', than (3) can be omitted,
> .. if followed by '= z' then (1) is more probable.
>
> In case of ambiguity I'ld idealy like to generate error and abort.


Preceding tokens can be used to instruct an lexer about how to scan the
following input. Other conditions, in detail when depending on following
tokens, require very special (scannerless) parser generators.


> Now can Coco/R, (can any other parser/lexer generator ) can do
> multiple tokenizations & parser-tree-generations , so that I can
> give a priotity to each of these three, and it can call my function
> to accept one over others.


CoCo/R cannot do that - unless you transform it into something very
different. MetaS most probably can handle your language, when you can
provide the according grammar.




All in all I think that your language is b*llshit. The user cannot know
how his input is parsed, even if no error is reported - is this really
what you and your users want?


I'd suggest that your language should *require* whitespace between
identifiers and (at least) the ambiguous operators.


DoDi


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.