Re: Parsing C#-like generics

BGB <cr88192@hotmail.com>
Thu, 14 Jul 2011 13:13:50 -0700

From comp.compilers

Related articles
Parsing C#-like generics harold.aptroot@gmail.com (Harold Aptroot) (2011-07-11)
Re: Parsing C#-like generics DrDiettrich1@aol.com (Hans-Peter Diettrich) (2011-07-12)
Re: Parsing C#-like generics cr88192@hotmail.com (BGB) (2011-07-12)
Re: Parsing C#-like generics ben.titzer@gmail.com (Ben L. Titzer) (2011-07-13)
*Re: Parsing C#-like generics cr88192@hotmail.com (BGB)* (2011-07-14)**

| List of all articles for this month |

From:	BGB <cr88192@hotmail.com>
Newsgroups:	comp.compilers
Date:	Thu, 14 Jul 2011 13:13:50 -0700
Organization:	albasani.net
References:	11-07-019 11-07-021
Keywords:	parse, syntax
Posted-Date:	17 Jul 2011 09:47:18 EDT

On 7/12/2011 5:25 AM, Hans-Peter Diettrich wrote:
> Harold Aptroot schrieb:

<snip>

> IMO you should better separate declarations from code (statements,
> expressions). Then the parser will "know" from that context, that a
> declaration can contain<x> type lists, but not x<y expressions.
>
> Above example should parse better as
> x<x>{x<x}
> where the C style braces around statement blocks allow for better
> disambiguation of the< token.

the problem though is that often there are good reasons to allow these
types of things to appear in related contexts.

for example, if using a C-like declaration syntax, but without the aide
of having all types and declarations known up front, one will have to
deal with the potential ambiguity during parsing as to whether they are
dealing with one type of expression or another, and potentially need to
use some level of back-tracking to work this out.

part of this issue is because, in statement context, one of 3 different
major elements may appear:
a declaration;
a plain statement;
an expression.

given each may appear and it may not be possible to know up-front which
is present, one will have to tread carefully WRT avoiding ambiguities
between them, as an otherwise innocent seeming piece of syntax may lead
to potential misparsing elsewhere in the language. allowing too many
potential cases of misparses may frustrate programmers with otherwise
valid seeming code stepping on syntactic edge cases and being parsed as
something unintended.

better then IMO is to try to treat, declarations, statements, and
expressions, effectively as a unified whole (basically, a giant
expression tower which also includes statements and declarations as part
of its lower-end, essentially as precedence levels below the comma
operator). as well, one can try to avoid introducing syntactic
ambiguities wherever possible.

doing this may also allow in some cases allowing for much more compact
syntax, as extra typing can be left out which would otherwise be
required to disambiguate the syntax.

consider as a contrived example:
Foo foo(x)fun(x)new Foo(x);(x);

(nevermind that its meaning may not be entirely obvious, but code like
this may be written in my own language, which otherwise has a mostly
C-family style syntax.)

if the above were translated into a purer ActionScript style syntax, it
would look more like:
function foo(x):Foo { return(function(x){return(new Foo(x));}(x)); }

but, in my language, a few of these constructions can be left off (the
latter is valid syntax in my language as well, and is more-or-less
equivalent). all this is left as a matter of style.

side note: "foo(x)fun(x)new Foo(x);(x);", although only trivially
different, is not valid in my language, as now the parser has no idea
that it is looking at a function declaration.

however, as a tradeoff, I ended up having to omit C/Java style casts,
and ended up using a slightly nasty-looking syntax for attributes, each
because they created ambiguities with other parts of the syntax.

"x=(int)y;" is not valid, but would need to be written as "x=y as! int;"
("as" and "as!" are both casts, but differ as to how they handle cast
failures).

similarly, "$[foo]" or "$[foo(bar)]" is the syntax for attributes,
mostly because initially I was using C#-style "[foo(bar)]" attributes,
but these clashed in an annoying way with the current array syntax, and
the originally planned disambiguation rules would have been a little
nasty. unambiguous parsing would depend on subsequent syntax for
disambiguation, and I prefer to have it possible to know within a few
tokens which syntactic form is present, rather than potentially parsing
a large chunk of code only to discover that the wrong path had been
followed.

note that "@foo(bar)" probably would also have worked, but "$[...]" was
what I decided on.

as well as other "weird" syntax:
"[1,2,3]SB" for a 3-element signed byte array, mostly as I lacked any
good way to put it in prefix position ("#SB[1,2,3]" wouldn't have worked
for other reasons);
"[1,2,3]:sbyte" is equivalent to the above;
...

this is a major downside though:
the more features one tries to allow through a compact syntax, the more
hair that tends to appear, and it may risk leading to constructions that
are just plain nasty looking.

it is also made more difficult if one avoids depending on prior
declarations as context (frequently used for disambiguation in C and C++
syntax), which IMO has a number of drawbacks (creates dependency issues,
can slow down the parser, ...).

preferably also avoided is contextual semantic dependencies, where a
given expression may have very different semantics depending on the
context in which it is used. this can complicate the compiler and
potentially also confuse the user.

a more plain syntax, say, plain JavaScript style, one will not have so
many of these issues as pretty much everything in statement context is
either a plain expression, or uses a keyword to indicate what it is (the
'function' or 'var' keywords disambiguate these sorts of things). there
are merits to this route as well, as having most things indicated
explicitly via keywords makes the parser a good deal simpler.

ActionScript goes and adds a few things to the basic JavaScript style
syntax, notably the use of modifiers and explicit types, but most of
these are relatively straightforward (since the modifiers are themselves
keywords, and several other special cases are introduced mostly via the
introduction of additional keywords into certain contexts, ...).

or such...

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: Parsing C#-like generics

BGB <cr88192@hotmail.com>Thu, 14 Jul 2011 13:13:50 -0700

BGB <cr88192@hotmail.com>
Thu, 14 Jul 2011 13:13:50 -0700