Related articles |
---|
RE: Java Comment-Preserving Grammar quinn-j@shaw.ca (Quinn Tyler Jackson) (2004-05-30) |
RE: Java Comment-Preserving Grammar matt@faredge.com.au (Matthew Herrmann) (2004-05-30) |
RE: Java Comment-Preserving Grammar matt@faredge.com.au (Matthew Herrmann) (2004-05-30) |
Re: Java Comment-Preserving Grammar cfc@shell01.TheWorld.com (Chris F Clark) (2004-06-15) |
RE: Java Comment-Preserving Grammar quinn-j@shaw.ca (Quinn Tyler Jackson) (2004-06-21) |
From: | Quinn Tyler Jackson <quinn-j@shaw.ca> |
Newsgroups: | comp.compilers |
Date: | 21 Jun 2004 23:39:44 -0400 |
Organization: | Compilers Central |
References: | 04-06-071 |
Keywords: | Java, parse |
Posted-Date: | 21 Jun 2004 23:39:44 EDT |
Chris F Clark said:
> One of the few cases you need whitespace at parsing time is in C
> preprocessing (if you implement it in the parsing grammar), where in a
> #define whitespace present or absent between name identifier being
> defined and a following parenthesis determine whether the identifier
> is a parameterized macro (and the parenthesis begins an argument list)
> or not (and the parenthesis is part of the expansion). However, even
> in this case, the problem can be solved lexically by returning two
> different tokens sequences for "id(" and "id (".
>
> Note, it was this specific whitespace problem, that prompted the
> "ignore" extension in Yacc++, which specifcally allows one to omit
> whitespace from all parts of the grammar where it isn't important for
> the parsing (and not just the lexing) phase, but to include it where
> it was important. The same problem prompted Quinn Tyler Jackson to a
> different solution in meta-S.
Ah, whitespace.
Yes, Meta-S grammars ($-grammars) do indeed take a different approach to
whitespace.
Whitespace rules, like any other rule in a $-grammar, can change in
mid-parse. The parsing engine behind Meta-S doesn't really have a notion of
a static "terminal" and a "non-terminal" in the traditional sense, and
whitespace is just another production in the grammar, albeit a reserved name
is used for the production used for whitespace (namely: __ws). Whitespace
nodes can be dropped from the tree during a parse, the same way any
production's nodes can be, through the use of the #notree directive, but
#notree is not quite the same as ignore, in that ignore'd tokens typically
never reach get beyond a traditional parser's lexical analysis phase,
whereas #notree productions are still fully productions -- just productions
that don't adorn the parse tree with artifacts.
Since the inclusion of too many __ws and [__ws] statements within a grammar
can be quite unsightly, I introduced the abbreviations ## and #? for those,
resulting in rules such as:
foo ::= id #? "(" #? expr #? ")"
In a similar vein, there is another notion that turned out to be useful ...
that of the "keyword terminator":
__kw ::= '[a-zA-Z-=9_]';
return_statement ::= "return" #@ expr #? ";";
In the above, #@ expands to ^__kw ("not __kw"). This allows for:
return(10);
return 10;
return a;
but not for:
return10;
returna;
Several have suggested that I could probably have deduced the ##, #?, and #@
operators and inserted them quietly during generation, but this is not
always the case, so I left the requirement for whitespace operators in
A-BNF.
To see why it is not always the case that whitespace operators can be
determined, it must be remembered that productions can change in an
$-grammar during a parse:
expr ::= /* some dynamic rule that might be either '[a-z]+' or '(' expr ')'
or arith_expr */;
ret_expr ::= return expr ";"
In the above example, at any given point in the parse, depending on the
definition of expr, ret_expr would need to be any of the following:
return #? expr; // if expr = '(' expr ')';
return ## expr; // if expr = '[a-z]+';
return #@ #? expr; // if expr = arith_expr;
Rather than focus on ways to deduce these kinds of things at run-time, I
decided it was safer to require explicit whitespace tokens in all cases.
--
Quinn
Return to the
comp.compilers page.
Search the
comp.compilers archives again.