Re: Buffered input for a lexer?

Chris Lattner <sabre@nondot.org>
24 Mar 2002 15:30:11 -0500

          From comp.compilers

Related articles
Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-24)
Re: Buffered input for a lexer? zackw@panix.com (Zack Weinberg) (2002-03-24)
Buffered input for a lexer? cfc@world.std.com (Chris F Clark) (2002-03-24)
Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-24)
Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-24)
Re: Buffered input for a lexer? rhyde@cs.ucr.edu (Randall Hyde) (2002-03-25)
Re: Buffered input for a lexer? cfc@world.std.com (Chris F Clark) (2002-03-25)
Re: Buffered input for a lexer? clint@0lsen.net (2002-03-31)
Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-31)
Re: Buffered input for a lexer? sabre@nondot.org (Chris Lattner) (2002-03-31)
[14 later articles]
| List of all articles for this month |

From: Chris Lattner <sabre@nondot.org>
Newsgroups: comp.compilers
Date: 24 Mar 2002 15:30:11 -0500
Organization: University of Illinois at Urbana-Champaign
References: 02-03-162 02-03-165
Keywords: lex
Posted-Date: 24 Mar 2002 15:30:11 EST

Zack Weinberg <zackw@panix.com> wrote:
> You can combine these approaches to mitigate the problems with either.
> When you encounter a sentinel character, check the input pointer against
> the buffer; if it's at the end, absorb more input and restart, otherwise
> proceed to process the character normally. This works well as long as
> the sentinel character appears only rarely in normal input (for source
> code, ascii 0 is a good choice).


Ok, that makes sense. I like the idea of having a fast normal case,
yet allow extreme cases with a performance penalty. Thanks!


> Ridiculously huge tokens can show up in machine-generated code.
> The example I'm familiar with is the C++ name mangling scheme used
> by old (pre-3.0) GCC. This regularly produced symbols which overran
> input buffers in various assemblers. I don't remember just how big
> they got, but more than 16K would not surprise me.


> [16K identifiers? Really? Name mangling usually adds only one character
> per argument to the original name. -John]


In C++ in particular, templates can cause a huge blowup in the size of
the mangled name, due to default type arguments and nesting of
templates... the pre 3.0 mangling scheme was a straightforward mangling
scheme that could generate _enormous_ identifiers. Thankfully, 3.0 now
uses the IA64 mangling scheme to compress the mangled names.


-Chris


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.