Re: Lookahead vs. Scanner Feedback

bliss@sp64.csrd.uiuc.edu (Brian Bliss)
Wed, 8 Jan 92 17:55:13 GMT

From comp.compilers

Related articles
[3 earlier articles]
Re: Lookahead vs. Scanner Feedback sef@kithrup.COM (1992-01-07)
Re: Lookahead vs. Scanner Feedback Jan.Rekers@cwi.nl (1992-01-07)
Re: Lookahead vs. Scanner Feedback burley@geech.gnu.ai.mit.edu (1992-01-07)
Re: Lookahead vs. Scanner Feedback drw@lagrange.mit.edu (1992-01-07)
Re: Lookahead vs. Scanner Feedback smk@dcs.edinburgh.ac.uk (1992-01-07)
Re: Lookahead vs. Scanner Feedback bill@twwells.com (1992-01-08)
*Re: Lookahead vs. Scanner Feedback bliss@sp64.csrd.uiuc.edu* (1992-01-08)**
Re: Lookahead vs. Scanner Feedback nigelh@sol.UVic.CA (1992-01-08)
Re: Lookahead vs. Scanner Feedback dww@inf.fu-berlin.de (1992-01-08)
Re: Lookahead vs. Scanner Feedback jwoods@convex.com (1992-01-09)
Re: Lookahead vs. Scanner Feedback jwoods@convex.com (1992-01-10)
Re: Lookahead vs. Scanner Feedback bliss@sp64.csrd.uiuc.edu (1992-01-13)
Re: Lookahead vs. Scanner Feedback megatest!djones@decwrl.dec.com (1992-01-13)

| List of all articles for this month |

Newsgroups:	comp.compilers
From:	bliss@sp64.csrd.uiuc.edu (Brian Bliss)
Keywords:	parse, C
Organization:	UIUC Center for Supercomputing Research and Development
References:	92-01-032
Date:	Wed, 8 Jan 92 17:55:13 GMT

In article 92-01-032, smk@dcs.edinburgh.ac.uk writes:
|> [Reusing a typedef name] shouldn't be a problem, because this is not really
|> an ambiguous occurrence. You can deal with that by having a production
|>
|> any_ident : ident | type_ident;
|>
|> and using any_ident for the identifier in a declarator (and several other
|> places). This should be possible without introducing any ambiguities.
|>
|> But for some parts of the C syntax this is not so easy, for labels you
|> probably have to expand the any_ident production to allow programs like
|>
|> typef int foo;
|> main ()
|> { foo: ;
|> }
|>
|> because otherwise there is a shift-reduce conflict
|> (reduce type_ident to any_ident for labels, shift for declarations).
[It's not impossible, but it's tricky and messy to get right. -John]

O.K. I haven't got out the grammar and done the actual table construction
(read: disclaimer), but declarations ARE the one place where you do need
the separate tokens for ident and type_ident. any other place, the
any_ident->ident|type_ident rule works fine (On labels, for instance, the
: in the lookahead stream resolves the ambiguity. I have also sucessfully
used the above productions to allow a typedef name to also be a tag name).
Consider the code fragment:

typedef int z;
main() {
long z;
}

is z being redeclared as a local variable in main(), or are you just
specifying the empty declaration for a long int type? The ambiguity
depends upon which token you return from the lexical analyzer when a is
encountered for the second time. The ANSI C grammar in the back of K&RII
is not ambiguous: it assumes that the lexer resolves the ambiguity, not
the parser.

The fix to this problem is much easier than I first thought: Just use
lex's right-context sensitivity operator (/) to search ahead in the input
stream for one of [,{;] (preceeded by optional whitespace) when an
identifier is encountered. In cases that match, always return the IDENT
token; on cases that don't, lookup the name and return TYPE_NAME if the
identifier is a typedef name, return IDENT otherwise.

As for my original statement

>One place where every yacc/lex based C compiler I know of is broken

I knew sun's cc was broken & any C compiler I had work on was too,
couldn't figure out a way to easily fix the problem, and over-generalized :-)

bb
--

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Lookahead vs. Scanner Feedback

bliss@sp64.csrd.uiuc.edu (Brian Bliss)Wed, 8 Jan 92 17:55:13 GMT

bliss@sp64.csrd.uiuc.edu (Brian Bliss)
Wed, 8 Jan 92 17:55:13 GMT