Related articles |
---|
Parsing C#-like generics harold.aptroot@gmail.com (Harold Aptroot) (2011-07-11) |
Re: Parsing C#-like generics DrDiettrich1@aol.com (Hans-Peter Diettrich) (2011-07-12) |
Re: Parsing C#-like generics cr88192@hotmail.com (BGB) (2011-07-12) |
Re: Parsing C#-like generics ben.titzer@gmail.com (Ben L. Titzer) (2011-07-13) |
Re: Parsing C#-like generics cr88192@hotmail.com (BGB) (2011-07-14) |
From: | "Ben L. Titzer" <ben.titzer@gmail.com> |
Newsgroups: | comp.compilers |
Date: | Wed, 13 Jul 2011 10:19:32 -0700 (PDT) |
Organization: | Compilers Central |
References: | 11-07-019 |
Keywords: | parse, syntax |
Posted-Date: | 17 Jul 2011 09:47:05 EDT |
On Jul 11, 11:22 am, "Harold Aptroot" <harold.aptr...@gmail.com>
> I'm having some trouble parsing generics when mixed with comparisons. The
> way I try to do it, there is an ambiguity between LessThan and a "list of
> types between angle brackets".
> For example, x<x>(x<x) should be syntactically OK, and it should be parsed
> to a function call x with a type parameter list < x > and a single argument
> which is the expression x<x (ok not really, I threw in semantics here to
> make it clearer, the actual result should just be an AST).
> My parser generator (GOLD parsing system) complains about a shift-reduce
> error, and the parser it produces doesn't want to parse any expression with
> a LessThan in it because it believes that to be a incomplete type list
> (lacking a closing > )
>
> I know it is actually inherently ambiguous, because t<t2>(t3) could mean
> two things:
> - LessThan(t, BiggerThan(t2, t3)
> - invoke t<t2> with argument t3
> In that case I want to pick option two.
> For t<t2>t3 I want to pick option one, not report "missing ( "
>
> Can this be done with an LALR parser at all? If so, how?
One trick I've used in the past is to lex the '<' that introduces a
type parameter list as part of the identifier:
"foo" would lex as a single IDENT token.
and
"foo<" would lex as a single PARAMETERIZED_IDENT token.
and
"foo <" would lex as IDENT followed by LESS_THAN
You can then use the IDENT and PARAMETERIZED_IDENT tokens in various
places in the grammar, with PARAMETERIZED_IDENT being followed by a
type list and a '>' token.
This then requires any use of the '<' operator that follow an
identifer to have intervening whitespace. It also requires that any
parameterization of an identifier not have intervening whitespace. I
think it's a decent tradeoff if you are defining the language
yourself, but won't work for languages with more complex rules for
resolving the ambiguity.
Return to the
comp.compilers page.
Search the
comp.compilers archives again.