Related articles |
---|
Lexing Unicode strings? johann@myrkraverk.com (Johann 'Myrkraverk' Oskarsson) (2021-04-21) |
Re: Lexing Unicode strings? johann@myrkraverk.com (Johann 'Myrkraverk' Oskarsson) (2021-05-03) |
Re: Lexing Unicode strings? gah4@u.washington.edu (gah4) (2021-05-04) |
Re: Lexing Unicode strings? christopher.f.clark@compiler-resources.com (Christopher F Clark) (2021-05-04) |
Re: Lexing Unicode strings? gah4@u.washington.edu (gah4) (2021-05-04) |
Re: Lexing Unicode strings? haberg-news@telia.com (Hans Aberg) (2021-07-14) |
From: | "Johann 'Myrkraverk' Oskarsson" <johann@myrkraverk.com> |
Newsgroups: | comp.compilers |
Date: | Wed, 21 Apr 2021 16:20:40 +0000 |
Organization: | Compilers Central |
Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="7784"; mail-complaints-to="abuse@iecc.com" |
Keywords: | lex, i18n, question |
Posted-Date: | 21 Apr 2021 12:38:24 EDT |
Dear c.compilers,
For context, I have been reading the old book Compiler design in C
by Allen Holub; available here
https://holub.com/compiler/
and it goes into the details of the author's own LeX implementation.
Just like the dragon book [which I admit I haven't read for some number
of years] this uses lookup tables for the individual characters, which
is fine for ASCII, but does kind of seem excessive for all 0x10ffff code
points in Unicode.
I am interested in this, using plain old C, without using external tools
like ICU, for my own reasons[1]. What data structures are appropriate
for this exercise? Are there resources out there I can study, other
than the ICU source code? [Which for other reasons of my own, I'm not
too keen on studying.]
[1] Let's leave out the question if I'll be successful or not.
Thanks,
--
Johann
[The obvious approach if you're scaning UTF-8 text is to keep treating the input as
a sequence of bytes. UTF-8 was designed so that no character representation is a prefix or suffix
of any other character, so it should work without having to be clever. -John]
Return to the
comp.compilers page.
Search the
comp.compilers archives again.