|Parsing C#-like generics email@example.com (Harold Aptroot) (2011-07-11)|
|Re: Parsing C#-like generics DrDiettrich1@aol.com (Hans-Peter Diettrich) (2011-07-12)|
|Re: Parsing C#-like generics firstname.lastname@example.org (BGB) (2011-07-12)|
|Re: Parsing C#-like generics email@example.com (Ben L. Titzer) (2011-07-13)|
|Re: Parsing C#-like generics firstname.lastname@example.org (BGB) (2011-07-14)|
|From:||"Ben L. Titzer" <email@example.com>|
|Date:||Wed, 13 Jul 2011 10:19:32 -0700 (PDT)|
|Posted-Date:||17 Jul 2011 09:47:05 EDT|
On Jul 11, 11:22 am, "Harold Aptroot" <harold.aptr...@gmail.com>
> I'm having some trouble parsing generics when mixed with comparisons. The
> way I try to do it, there is an ambiguity between LessThan and a "list of
> types between angle brackets".
> For example, x<x>(x<x) should be syntactically OK, and it should be parsed
> to a function call x with a type parameter list < x > and a single argument
> which is the expression x<x (ok not really, I threw in semantics here to
> make it clearer, the actual result should just be an AST).
> My parser generator (GOLD parsing system) complains about a shift-reduce
> error, and the parser it produces doesn't want to parse any expression with
> a LessThan in it because it believes that to be a incomplete type list
> (lacking a closing > )
> I know it is actually inherently ambiguous, because t<t2>(t3) could mean
> two things:
> - LessThan(t, BiggerThan(t2, t3)
> - invoke t<t2> with argument t3
> In that case I want to pick option two.
> For t<t2>t3 I want to pick option one, not report "missing ( "
> Can this be done with an LALR parser at all? If so, how?
One trick I've used in the past is to lex the '<' that introduces a
type parameter list as part of the identifier:
"foo" would lex as a single IDENT token.
"foo<" would lex as a single PARAMETERIZED_IDENT token.
"foo <" would lex as IDENT followed by LESS_THAN
You can then use the IDENT and PARAMETERIZED_IDENT tokens in various
places in the grammar, with PARAMETERIZED_IDENT being followed by a
type list and a '>' token.
This then requires any use of the '<' operator that follow an
identifer to have intervening whitespace. It also requires that any
parameterization of an identifier not have intervening whitespace. I
think it's a decent tradeoff if you are defining the language
yourself, but won't work for languages with more complex rules for
resolving the ambiguity.
Return to the
Search the comp.compilers archives again.