Re: parser-generators for RRPGs

parrt@ecn.purdue.edu (Terence J Parr)
Fri, 29 Jan 1993 17:55:40 GMT

          From comp.compilers

Related articles
parser-generators for RRPGs dyck@cs.sfu.ca (Michael Dyck) (1993-01-28)
Re: parser-generators for RRPGs parrt@ecn.purdue.edu (1993-01-29)
| List of all articles for this month |

Newsgroups: comp.compilers
From: parrt@ecn.purdue.edu (Terence J Parr)
Keywords: parse, tools
Organization: Compilers Central
References: 93-01-206
Date: Fri, 29 Jan 1993 17:55:40 GMT

Michael Dyck <dyck@cs.sfu.ca> writes:


> if-statement = "IF" condition "THEN" statements
> { "ELSIF" condition "THEN" statements }
> [ "ELSE" statements ]
> "END"
>
> It seems like most grammars presented for human consumption are RRPGs, but
> most grammars presented for machine consumption are CFGs. What
> parser-generators are available that accept some form of RRPG?


ANTLR, the parser generator in PCCTS (Purdue Compiler Construction Tool
Set), supports the following EBNF constructs:


Make a subrule:
( alternative 1 | alternative 2 | ... | alternative n )


Make an optional subrule:
{ alternative 1 | alternative 2 | ... | alternative n }


Make a zero-or-more looping subrule:
( alternative 1 | alternative 2 | ... | alternative n )*


Make a one-or-more looping subrule:
( alternative 1 | alternative 2 | ... | alternative n )+


Subrules are treated like ordinary grammar symbols and hence can be nested
arbitrarily.


In a related subject, the next release, 1.07, will allow arbitrary regular
expressions of tokens to be used to predict many productions which require
arbitrary lookahead. e.g. (...)+ loops as prefixes tend to cause
problems:


a : (A)* X
    | (A|B)* Y
    ;


This is a major problem for LL-based parsers (assuming one cannot left
factor). However, using the semantic predicate mechanism also supported
by PCCTS, one can do the following:


a : << predict(A* X) >>? (A)* X
    | << predict([AB]* Y) >>? (A|B)* Y
    ;


PCCTS constructs a DFA for each parsing decision which has these predict
"functions". Then, at runtime, the DFA spins until it matches (or doesn't
match) one of the predicting regular expressions. Upon a successful
match, it knows which production to predict. NOTE that it is a predicting
regular expression of TOKENS not characters. There's lots of theory and
stuff behind the power and usefulness of this technique which I'll leave
out. This feature works in my 1.07 alpha version (in house).


> [I've seen occasional references to something called eyacc, which appeared
> to be an EBNF version of yacc. Is it not the case that RRPGs can be
> mechanically translated to CFGs, in which case a preprocessor for a normal
> parser generator might make sense? -John]


Actually, for recursive-descent parser generators like ANTLR, the (...)*
looping constructs etc... are best left in that form rather than
converting to BNF (although this is easy to do and I built one with
PCCTS). For example,


a : WORD ( COMMA WORD )*
    ;


would be mapped to something like:


a()
{
        MATCH(WORD);
while ( LA(1)==COMMA ) {
MATCH(COMMA);
MATCH(WORD);
}
}


where LA(1) is the next token of lookahead. This is much more efficient
than performing tail (LL) or left recursion (LR).


Terence
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.