Related articles |
---|
Unicode lexical analisis and parsing with flex and bison in C++? craigh@visio.com (Craig Hobbs) (1998-01-12) |
Re: Unicode lexical analisis and parsing with flex and bison in C++? henry@zoo.toronto.edu (Henry Spencer) (1998-01-13) |
Re: Unicode lexical analisis and parsing with flex and bison in C++? schwartz@finch.cse.psu.edu (1998-01-14) |
From: | Henry Spencer <henry@zoo.toronto.edu> |
Newsgroups: | comp.compilers |
Date: | 13 Jan 1998 00:18:41 -0500 |
Organization: | SP Systems, Toronto |
References: | 98-01-042 |
Keywords: | lex, i18n |
Craig Hobbs <craigh@visio.com> wrote:
>I'd imagine (although I haven't tried it) that one would need to
>modify the generated code to use wchar_t instead of char...
>[It's not that simple, lex uses character values as indexes into
>arrays, and you might not want to have a lot of 64K arrays in your
>code...]
The orthodox approach to this, although it would take some changes to
lex, is to use a single big table to map the incoming character into
one of a small number of equivalence classes, and use
equivalence-class number to index into your internal arrays. Even
with 8-bit characters, the number of equivalence classes is almost
always *much* smaller than the number of distinct characters. (For
example, even with a fairly complex system of numeric constants, the
digits 1-7 are typically one equivalence class, because the regular
expressions don't distinguish between them.)
The size of the mapping table can be kept under control, if that's an
issue, by using a two-level table, with the upper byte of the
character indexing into a first-level table which contains pointers to
second-level tables, which in turn are indexed by the lower byte. The
advantage is that many of the second-level tables are identical, and
you need only one copy of each *different* second-level table. (This
helps both memory consumption and cache hit ratio, and is typically
quicker than fooling around with conditional branches to implement
things like run-length compression of the table.)
--
| Henry Spencer
| henry@zoo.toronto.edu
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.