RE: Java Comment-Preserving Grammar

Quinn Tyler Jackson <quinn-j@shaw.ca>
30 May 2004 13:16:52 -0400

          From comp.compilers

Related articles
Java Comment-Preserving Grammar matthew-google@faredge.com.au (2004-05-24)
RE: Java Comment-Preserving Grammar quinn-j@shaw.ca (Quinn Tyler Jackson) (2004-05-30)
RE: Java Comment-Preserving Grammar matt@faredge.com.au (Matthew Herrmann) (2004-05-30)
RE: Java Comment-Preserving Grammar matt@faredge.com.au (Matthew Herrmann) (2004-05-30)
RE: Java Comment-Preserving Grammar quinn-j@shaw.ca (Quinn Tyler Jackson) (2004-06-21)
| List of all articles for this month |
From: Quinn Tyler Jackson <quinn-j@shaw.ca>
Newsgroups: comp.compilers
Date: 30 May 2004 13:16:52 -0400
Organization: Compilers Central
References: 04-05-075
Keywords: Java, parse
Posted-Date: 30 May 2004 13:16:51 EDT

Matthew Herrmann said:


> Is there an available Java grammar which preserves comments in an
> output AST tree? I know there are plenty of java grammars out there,
> but all ignore comments. I'm happy to use any tool providing I can
> gain access to the comments in the tree.
>
> My end goal is to parse commented-out extensions to the java language:
>
> class /*#immutable*/ Blah {
>
> public /*#input*/ int x;
> public /*#result*/ int y;
>
> }


From a response I received to a similar question years back (when I
was considering a C++ pretty-printer grammar), I gathered that
retaining comments in a parse with standard tools is done through ad
hockery in the lexer that attachs comments to other tokens. Most
agreed that retaining whitespace is almost always ugly.


That said, because $-grammars are adaptive at run-time, whitespace
must be explictly denoted within the grammar using the ## (required
whitespace) and #? (optional whitespace) operators. This operators, in
turn, map to a production __ws. Typically, the first thing in the __ws
rule is #notree, which effectively strips whitespace from the tree and
speeds up parsing a bit -- but there's nothing about __ws that
requires the #notree directive to be used, and thus, parse trees by
default include nodes for whitespace.


There are a few problems, however:


1. I've written a C++, C#, and even a partial Perl (ugh) grammar in
A-BNF (note the hyphen -- not ABNF), but have never gotten around to
writing a Java grammar yet. It wouldn't be too difficult -- just
haven't found enough interesting language features in Java to warrant
YAJG.


2. There just hasn't been enough interest in Type N (for N < 2)
parsing for me to pursue trying to convince people to move over to the
LPM parsing engine. ;-) I have been continually improving the thing,
and it's still being worked on, but for the most part, it's been
shelved to outsiders. I am preparing a paper on "Efficient
Context-Sensitive Parsing" for refereed review, but that won't help
you with your case. Most likely, the paper will discuss strategies for
parsing whitespace cleanly.


3. A Type-0 capable parser may be too powerful for your needs. You
could avoid Type 0 power by not using both predicates and tries -- but
the temptation is always there to use both in the same grammar, and
the moment both occur in the same grammar -- the HP might crop up in
some unexpected ways.


If you introduce some production called WS into your grammar, and
don't "ignore" whitespace in your lexical definition, you have to be
careful that it doesn't become littered with the things. Because
$-grammars require the beasts, I've become pretty good at knowing
where and where not to put them (due to much practice), but even so,
many people have complained that they're tricky to use at first
because other formats don't require them.


Good luck.


--
Quinn


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.