Related articles |
---|
Tokenizer theory and practice DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-05-13) |
Re: Tokenizer theory and practice cr88192@hotmail.com (cr88192) (2008-05-16) |
Re: Tokenizer theory and practice mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2008-05-16) |
Re: Tokenizer theory and practice DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-05-17) |
Re: Tokenizer theory and practice haberg_20080406@math.su.se (Hans Aberg) (2008-05-17) |
Re: Tokenizer theory and practice DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-05-17) |
Re: Tokenizer theory and practice cr88192@hotmail.com (cr88192) (2008-05-18) |
Re: Tokenizer theory and practice cr88192@hotmail.com (cr88192) (2008-05-18) |
Re: Tokenizer theory and practice mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2008-05-18) |
Re: Tokenizer theory and practice DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-05-18) |
Re: Tokenizer theory and practice cr88192@hotmail.com (cr88192) (2008-05-20) |
From: | Hans Aberg <haberg_20080406@math.su.se> |
Newsgroups: | comp.compilers |
Date: | Sat, 17 May 2008 11:13:26 +0200 |
Organization: | Aioe.org NNTP Server |
References: | 08-05-050 |
Keywords: | lex, i18n |
Posted-Date: | 17 May 2008 09:26:08 EDT |
Hans-Peter Diettrich wrote:
> Unicode introduces a couple of problems into lexers, which I don't want
> to discuss too deeply. Most important seems to be the expansion of the
> character codes, from single to multiple bytes.
Unicode regular expressions can be lexed directly by rewriting into UTF.
I posted some Haskell function for doing that here
http://lists.gnu.org/archive/html/help-flex/2005-01/msg00043.html
Hans Aberg
Return to the
comp.compilers page.
Search the
comp.compilers archives again.