Re: Double-byte lex and yacc?

"Michael O'Leary" <moleary@primus.com>
16 Apr 1997 00:22:12 -0400

          From comp.compilers

Related articles
Double-byte lex and yacc? moleary@primus.com (Michael O'Leary) (1997-04-02)
Re: Double-byte lex and yacc? sreeni@csc.albany.edu (1997-04-03)
Re: Double-byte lex and yacc? Julian.Orbach@unisys.com (1997-04-03)
Re: Double-byte lex and yacc? dds@flavors.com (Duncan Smith) (1997-04-06)
Re: Double-byte lex and yacc? moleary@primus.com (Michael O'Leary) (1997-04-16)
| List of all articles for this month |

From: "Michael O'Leary" <moleary@primus.com>
Newsgroups: comp.compilers
Date: 16 Apr 1997 00:22:12 -0400
Organization: Primus Communications Corporation
References: 97-04-013
Keywords: lex, i18n, comment

So if I wanted to tokenize unicode text that is a mixture of latin1
and japanese characters, would it work to use lex to group pairs of
bytes into double-byte character tokens of type LATIN1_ALPHA,
LATIN1_PUNCT, JAPANESE_HIRAGANA, JAPANESE_KATAKANA, JAPANESE_KANJI,
etc., and then use yacc to perform the higher level tokenization into
Latin1 and Japanese substrings, etc., or would that be too slow?
Also, can lex handle 0x00 bytes in an input stream, or would it always
treat them as string terminators? Thanks. Michael O'Leary


Michael O'Leary wrote:
>
> Are there any versions of lex and/or yacc that are capable of
> accepting double-byte character streams as input?
[If it's unicode, there aren't any double-byte characters. But yes, in
general, once you get lex to find your tokens, yacc can parse the result.
And lex does handle 0 bytes albeit with a modest performance penalty.
-John]
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.