Re: Input-driven lexical scanner

Chris F Clark <>
19 Dec 2000 17:03:58 -0500

          From comp.compilers

Related articles
Input-driven lexical scanner (2000-12-18)
Re: Input-driven lexical scanner (Chris F Clark) (2000-12-19)
Re: Input-driven lexical scanner (2000-12-20)
Re: Input-driven lexical scanner (2000-12-20)
| List of all articles for this month |

From: Chris F Clark <>
Newsgroups: comp.compilers
Date: 19 Dec 2000 17:03:58 -0500
Organization: The World Public Access UNIX, Brookline, MA
References: 00-12-078
Keywords: lex
Posted-Date: 19 Dec 2000 17:03:58 EST

Olaf wrote:
> An application I'm writing needs the opposite way: the scanner routine
> gets called with an input buffer as argument (variable size - may even
> contain only a single byte). It has to process this, and on completion
> of a token, call a processing routine with that token as argument. It
> is not possible to block or wait for input in any way.
> Is there any scanner generator which is able to do this? I've
> experimented with re2c, so that its YYFILL macro completely resets the
> state and returns to the caller with a special value meaning "I need
> more data", then the caller can fill the buffer and restart the
> scanner. Problems are, (a) resetting is not as efficient as I'd like,
> and (b) it doesn't work reliably; perhaps my buffer management is
> subtly wrong or I don't completely understand the real meaning of
> YYCURSOR and YYMARKER. (E.g. is it right that YYMARKER <= YYCURSOR?
> This is nowhere documented.)
> Other than that, I think re2c is already the right tool: extremely
> lightweight, reentrant (necessary!), target language is C (not C++).

Aside from your need to have C output, I would have recommended
Yacc++. Its model is exactly the order you proscribe--input drives
lexer, lexer drives parser. However, I suspect that it is not a
light-weight as re2c (and it doesn't intend to be).

Is is possible that your problems occur when your input buffer does
not end on a token boundary and your YYFILL macro resets the state,
but you had a partial token pending? (Note, I am speculating based
upon general knowledge and not based upon insight into re2c.)

Our lexer would get somewhat lost if you *incorrectly* reset the
buffer in mid-token. A lexer object *can* (and usually does, since
most buffers don't end on token boundaries) return mid-token and wait
for the input object to provide it with more input, but the lexer
needs to keep certain information from the previous buffer across that
boundary and going into the buffer and trashing that information would
lose that state.

Hope this helps,

Chris Clark Internet :
Compiler Resources, Inc. Web Site :
3 Proctor Street voice : (508) 435-5016
Hopkinton, MA 01748 USA fax : (508) 435-4847 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.