Re: Parsing C#-like generics

"Ben L. Titzer" <>
Wed, 13 Jul 2011 10:19:32 -0700 (PDT)

          From comp.compilers

Related articles
Parsing C#-like generics (Harold Aptroot) (2011-07-11)
Re: Parsing C#-like generics (Hans-Peter Diettrich) (2011-07-12)
Re: Parsing C#-like generics (BGB) (2011-07-12)
Re: Parsing C#-like generics (Ben L. Titzer) (2011-07-13)
Re: Parsing C#-like generics (BGB) (2011-07-14)
| List of all articles for this month |

From: "Ben L. Titzer" <>
Newsgroups: comp.compilers
Date: Wed, 13 Jul 2011 10:19:32 -0700 (PDT)
Organization: Compilers Central
References: 11-07-019
Keywords: parse, syntax
Posted-Date: 17 Jul 2011 09:47:05 EDT

On Jul 11, 11:22 am, "Harold Aptroot" <>
> I'm having some trouble parsing generics when mixed with comparisons. The
> way I try to do it, there is an ambiguity between LessThan and a "list of
> types between angle brackets".
> For example, x<x>(x<x) should be syntactically OK, and it should be parsed
> to a function call x with a type parameter list < x > and a single argument
> which is the expression x<x (ok not really, I threw in semantics here to
> make it clearer, the actual result should just be an AST).
> My parser generator (GOLD parsing system) complains about a shift-reduce
> error, and the parser it produces doesn't want to parse any expression with
> a LessThan in it because it believes that to be a incomplete type list
> (lacking a closing > )
> I know it is actually inherently ambiguous, because t<t2>(t3) could mean
> two things:
> - LessThan(t, BiggerThan(t2, t3)
> - invoke t<t2> with argument t3
> In that case I want to pick option two.
> For t<t2>t3 I want to pick option one, not report "missing ( "
> Can this be done with an LALR parser at all? If so, how?

One trick I've used in the past is to lex the '<' that introduces a
type parameter list as part of the identifier:

"foo" would lex as a single IDENT token.
"foo<" would lex as a single PARAMETERIZED_IDENT token.
"foo <" would lex as IDENT followed by LESS_THAN

You can then use the IDENT and PARAMETERIZED_IDENT tokens in various
places in the grammar, with PARAMETERIZED_IDENT being followed by a
type list and a '>' token.

This then requires any use of the '<' operator that follow an
identifer to have intervening whitespace. It also requires that any
parameterization of an identifier not have intervening whitespace. I
think it's a decent tradeoff if you are defining the language
yourself, but won't work for languages with more complex rules for
resolving the ambiguity.

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.