Re: Tokenizer theory and practice

Hans Aberg <haberg_20080406@math.su.se>
Sat, 17 May 2008 11:13:26 +0200

From comp.compilers

Related articles
Tokenizer theory and practice DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-05-13)
Re: Tokenizer theory and practice cr88192@hotmail.com (cr88192) (2008-05-16)
Re: Tokenizer theory and practice mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2008-05-16)
Re: Tokenizer theory and practice DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-05-17)
*Re: Tokenizer theory and practice haberg_20080406@math.su.se (Hans Aberg)* (2008-05-17)**
Re: Tokenizer theory and practice DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-05-17)
Re: Tokenizer theory and practice cr88192@hotmail.com (cr88192) (2008-05-18)
Re: Tokenizer theory and practice cr88192@hotmail.com (cr88192) (2008-05-18)
Re: Tokenizer theory and practice mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2008-05-18)
Re: Tokenizer theory and practice DrDiettrich1@aol.com (Hans-Peter Diettrich) (2008-05-18)
Re: Tokenizer theory and practice cr88192@hotmail.com (cr88192) (2008-05-20)

| List of all articles for this month |

From:	Hans Aberg <haberg_20080406@math.su.se>
Newsgroups:	comp.compilers
Date:	Sat, 17 May 2008 11:13:26 +0200
Organization:	Aioe.org NNTP Server
References:	08-05-050
Keywords:	lex, i18n
Posted-Date:	17 May 2008 09:26:08 EDT

Hans-Peter Diettrich wrote:
> Unicode introduces a couple of problems into lexers, which I don't want
> to discuss too deeply. Most important seems to be the expansion of the
> character codes, from single to multiple bytes.

Unicode regular expressions can be lexed directly by rewriting into UTF.
I posted some Haskell function for doing that here
http://lists.gnu.org/archive/html/help-flex/2005-01/msg00043.html

Hans Aberg

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Tokenizer theory and practice

Hans Aberg <haberg_20080406@math.su.se>Sat, 17 May 2008 11:13:26 +0200

Hans Aberg <haberg_20080406@math.su.se>
Sat, 17 May 2008 11:13:26 +0200