Related articles |
---|
Tokens across two input buffers cherico@bonbon.net (2004-09-21) |
From: | cherico@bonbon.net (cherico) |
Newsgroups: | comp.compilers |
Date: | 21 Sep 2004 22:21:30 -0400 |
Organization: | http://groups.google.com |
Keywords: | lex, question |
Posted-Date: | 21 Sep 2004 22:21:30 EDT |
I am using flex to detect utf-8 encoded letters. Because the input is
from socket, so I use yy_switch_to_buffer() everytime new data coming
from the socket descriptor.
But sometimes, a utf-8 token may be divided into two pieces in two
sequent buffers due to the nature of socket. This resulted in
incorrect result.
I tried to put the "imcomplete" characters back to the input stream in
<<EOF>> rule (use yyless). But these characters were output before the
<<EOF>> rule.
Is there any way to solve this problem?
[Of course. Rather than using yy_switch_to_buffer, define a version
of YY_INPUT to get the data from the socket. -John]
Return to the
comp.compilers page.
Search the
comp.compilers archives again.