Related articles |
---|
Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-24) |
Re: Buffered input for a lexer? zackw@panix.com (Zack Weinberg) (2002-03-24) |
Buffered input for a lexer? cfc@world.std.com (Chris F Clark) (2002-03-24) |
Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-24) |
Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-24) |
Re: Buffered input for a lexer? rhyde@cs.ucr.edu (Randall Hyde) (2002-03-25) |
Re: Buffered input for a lexer? cfc@world.std.com (Chris F Clark) (2002-03-25) |
Re: Buffered input for a lexer? clint@0lsen.net (2002-03-31) |
Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-31) |
Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-31) |
[14 later articles] |
From: | Chris Lattner <sabre@nondot.org> |
Newsgroups: | comp.compilers |
Date: | 24 Mar 2002 15:30:11 -0500 |
Organization: | University of Illinois at Urbana-Champaign |
References: | 02-03-162 02-03-165 |
Keywords: | lex |
Posted-Date: | 24 Mar 2002 15:30:11 EST |
Zack Weinberg <zackw@panix.com> wrote:
> You can combine these approaches to mitigate the problems with either.
> When you encounter a sentinel character, check the input pointer against
> the buffer; if it's at the end, absorb more input and restart, otherwise
> proceed to process the character normally. This works well as long as
> the sentinel character appears only rarely in normal input (for source
> code, ascii 0 is a good choice).
Ok, that makes sense. I like the idea of having a fast normal case,
yet allow extreme cases with a performance penalty. Thanks!
> Ridiculously huge tokens can show up in machine-generated code.
> The example I'm familiar with is the C++ name mangling scheme used
> by old (pre-3.0) GCC. This regularly produced symbols which overran
> input buffers in various assemblers. I don't remember just how big
> they got, but more than 16K would not surprise me.
> [16K identifiers? Really? Name mangling usually adds only one character
> per argument to the original name. -John]
In C++ in particular, templates can cause a huge blowup in the size of
the mangled name, due to default type arguments and nesting of
templates... the pre 3.0 mangling scheme was a straightforward mangling
scheme that could generate _enormous_ identifiers. Thankfully, 3.0 now
uses the IA64 mangling scheme to compress the mangled names.
-Chris
Return to the
comp.compilers page.
Search the
comp.compilers archives again.