Tokens across two input buffers (cherico)
21 Sep 2004 22:21:30 -0400

          From comp.compilers

Related articles
Tokens across two input buffers (2004-09-21)
| List of all articles for this month |

From: (cherico)
Newsgroups: comp.compilers
Date: 21 Sep 2004 22:21:30 -0400
Keywords: lex, question
Posted-Date: 21 Sep 2004 22:21:30 EDT

I am using flex to detect utf-8 encoded letters. Because the input is
from socket, so I use yy_switch_to_buffer() everytime new data coming
from the socket descriptor.

But sometimes, a utf-8 token may be divided into two pieces in two
sequent buffers due to the nature of socket. This resulted in
incorrect result.

I tried to put the "imcomplete" characters back to the input stream in
<<EOF>> rule (use yyless). But these characters were output before the
<<EOF>> rule.

Is there any way to solve this problem?
[Of course. Rather than using yy_switch_to_buffer, define a version
of YY_INPUT to get the data from the socket. -John]

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.