Tokens across two input buffers

cherico@bonbon.net (cherico)
21 Sep 2004 22:21:30 -0400

From comp.compilers

Related articles
*Tokens across two input buffers cherico@bonbon.net* (2004-09-21)**

| List of all articles for this month |

From:	cherico@bonbon.net (cherico)
Newsgroups:	comp.compilers
Date:	21 Sep 2004 22:21:30 -0400
Organization:	http://groups.google.com
Keywords:	lex, question
Posted-Date:	21 Sep 2004 22:21:30 EDT

I am using flex to detect utf-8 encoded letters. Because the input is
from socket, so I use yy_switch_to_buffer() everytime new data coming
from the socket descriptor.

But sometimes, a utf-8 token may be divided into two pieces in two
sequent buffers due to the nature of socket. This resulted in
incorrect result.

I tried to put the "imcomplete" characters back to the input stream in
<<EOF>> rule (use yyless). But these characters were output before the
<<EOF>> rule.

Is there any way to solve this problem?
[Of course. Rather than using yy_switch_to_buffer, define a version
of YY_INPUT to get the data from the socket. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Tokens across two input buffers

cherico@bonbon.net (cherico)21 Sep 2004 22:21:30 -0400

cherico@bonbon.net (cherico)
21 Sep 2004 22:21:30 -0400