Re: Buffered input for a lexer?

Chris Lattner <sabre@nondot.org>
24 Mar 2002 15:30:11 -0500

From comp.compilers

Related articles
Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-24)
Re: Buffered input for a lexer? zackw@panix.com (Zack Weinberg) (2002-03-24)
Buffered input for a lexer? cfc@world.std.com (Chris F Clark) (2002-03-24)
*Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner)* (2002-03-24)**
Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-24)
Re: Buffered input for a lexer? rhyde@cs.ucr.edu (Randall Hyde) (2002-03-25)
Re: Buffered input for a lexer? cfc@world.std.com (Chris F Clark) (2002-03-25)
Re: Buffered input for a lexer? clint@0lsen.net (2002-03-31)
Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-31)
Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-31)
[14 later articles]

| List of all articles for this month |

From:	Chris Lattner <sabre@nondot.org>
Newsgroups:	comp.compilers
Date:	24 Mar 2002 15:30:11 -0500
Organization:	University of Illinois at Urbana-Champaign
References:	02-03-162 02-03-165
Keywords:	lex
Posted-Date:	24 Mar 2002 15:30:11 EST

Zack Weinberg <zackw@panix.com> wrote:
> You can combine these approaches to mitigate the problems with either.
> When you encounter a sentinel character, check the input pointer against
> the buffer; if it's at the end, absorb more input and restart, otherwise
> proceed to process the character normally. This works well as long as
> the sentinel character appears only rarely in normal input (for source
> code, ascii 0 is a good choice).

Ok, that makes sense. I like the idea of having a fast normal case,
yet allow extreme cases with a performance penalty. Thanks!

> Ridiculously huge tokens can show up in machine-generated code.
> The example I'm familiar with is the C++ name mangling scheme used
> by old (pre-3.0) GCC. This regularly produced symbols which overran
> input buffers in various assemblers. I don't remember just how big
> they got, but more than 16K would not surprise me.

> [16K identifiers? Really? Name mangling usually adds only one character
> per argument to the original name. -John]

In C++ in particular, templates can cause a huge blowup in the size of
the mangled name, due to default type arguments and nesting of
templates... the pre 3.0 mangling scheme was a straightforward mangling
scheme that could generate _enormous_ identifiers. Thankfully, 3.0 now
uses the IA64 mangling scheme to compress the mangled names.

-Chris

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Buffered input for a lexer?

Chris Lattner <sabre@nondot.org>24 Mar 2002 15:30:11 -0500

Chris Lattner <sabre@nondot.org>
24 Mar 2002 15:30:11 -0500