Re: Parsing C#-like generics

"Ben L. Titzer" <ben.titzer@gmail.com>
Wed, 13 Jul 2011 10:19:32 -0700 (PDT)

          From comp.compilers

Related articles
Parsing C#-like generics harold.aptroot@gmail.com (Harold Aptroot) (2011-07-11)
Re: Parsing C#-like generics DrDiettrich1@aol.com (Hans-Peter Diettrich) (2011-07-12)
Re: Parsing C#-like generics cr88192@hotmail.com (BGB) (2011-07-12)
Re: Parsing C#-like generics ben.titzer@gmail.com (Ben L. Titzer) (2011-07-13)
Re: Parsing C#-like generics cr88192@hotmail.com (BGB) (2011-07-14)
| List of all articles for this month |
From: "Ben L. Titzer" <ben.titzer@gmail.com>
Newsgroups: comp.compilers
Date: Wed, 13 Jul 2011 10:19:32 -0700 (PDT)
Organization: Compilers Central
References: 11-07-019
Keywords: parse, syntax
Posted-Date: 17 Jul 2011 09:47:05 EDT

On Jul 11, 11:22 am, "Harold Aptroot" <harold.aptr...@gmail.com>
> I'm having some trouble parsing generics when mixed with comparisons. The
> way I try to do it, there is an ambiguity between LessThan and a "list of
> types between angle brackets".
> For example, x<x>(x<x) should be syntactically OK, and it should be parsed
> to a function call x with a type parameter list < x > and a single argument
> which is the expression x<x (ok not really, I threw in semantics here to
> make it clearer, the actual result should just be an AST).
> My parser generator (GOLD parsing system) complains about a shift-reduce
> error, and the parser it produces doesn't want to parse any expression with
> a LessThan in it because it believes that to be a incomplete type list
> (lacking a closing > )
>
> I know it is actually inherently ambiguous, because t<t2>(t3) could mean
> two things:
> - LessThan(t, BiggerThan(t2, t3)
> - invoke t<t2> with argument t3
> In that case I want to pick option two.
> For t<t2>t3 I want to pick option one, not report "missing ( "
>
> Can this be done with an LALR parser at all? If so, how?




One trick I've used in the past is to lex the '<' that introduces a
type parameter list as part of the identifier:


"foo" would lex as a single IDENT token.
and
"foo<" would lex as a single PARAMETERIZED_IDENT token.
and
"foo <" would lex as IDENT followed by LESS_THAN


You can then use the IDENT and PARAMETERIZED_IDENT tokens in various
places in the grammar, with PARAMETERIZED_IDENT being followed by a
type list and a '>' token.


This then requires any use of the '<' operator that follow an
identifer to have intervening whitespace. It also requires that any
parameterization of an identifier not have intervening whitespace. I
think it's a decent tradeoff if you are defining the language
yourself, but won't work for languages with more complex rules for
resolving the ambiguity.



Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.