|Can Coco/R do multiple tokenizations email@example.com (2005-08-13)|
|Re: Can Coco/R do multiple tokenizations firstname.lastname@example.org (George Neuner) (2005-08-16)|
|Re: Can Coco/R do multiple tokenizations DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2005-08-16)|
|Re: Can Coco/R do multiple tokenizations email@example.com (Gene Wirchenko) (2005-08-16)|
|Re: Can Coco/R do multiple tokenizations cfc@shell01.TheWorld.com (Chris F Clark) (2005-08-21)|
|Re: Can Coco/R do multiple tokenizations firstname.lastname@example.org (Darius Blasband) (2005-08-21)|
|From:||George Neuner <email@example.com>|
|Date:||16 Aug 2005 11:17:24 -0400|
|Posted-Date:||16 Aug 2005 11:17:24 EDT|
On 13 Aug 2005 00:27:08 -0400, firstname.lastname@example.org wrote:
> Consider a langauage, which allows ! and = in its identifiers.
> Of course usual C operators like !,= etc are also allowed.
> Consder this string (note no whitespaces ):
>In case of ambiguity I'd idealy like to generate error and abort.
Heuristics aside, I think that if you want to allow operators to be
embedded in identifier names and also use infix operators in the same
language, you are going to have to depend on correct delimiting and
trust that the user typed what she meant. Except in extremely obvious
cases, it's not a good idea for the compiler to be guessing at the
>Can Coco/R, (can any other parser/lexer generator ) do multiple
>tokenizations & parser-tree-generations
AFAIK, there are no existing lexer gen tools which allow alternate
tokenizations for the same input text. You are free to write one of
I don't know Coco/R, but what you want is possible by using deliberate
backtracking and multiple lexers. It's a slow and painstaking process
of trying a particular parse, saving the AST if the parse succeeds,
then backtracking, switching lexers and trying the same parse again.
If you end up with no ASTs, the parse failed, and if you end up with
multiple ASTs, the code was ambiguous.
It is likely to be *very* slow as you will need to keep all the lexers
in sync. Each time you switch you will need to adjust the input
starting position because the last successful parse may have used
tokens from a different lexer. There are various ways you might try
to optimize this time waster but the positioning has to be based on
the original source to be correct.
Personally I don't think it's worth the effort. I would parse the
code exactly as written and let users suffer the consequences of not
using the space bar. YMMV
Return to the
Search the comp.compilers archives again.