Re: Generalized operator-precedence parsing

David Chase <David.Chase@naturalbridge.com>
17 Jul 2001 23:21:54 -0400

          From comp.compilers

Related articles
Generalized operator-precedence parsing joachim_d@gmx.de (Joachim Durchholz) (2001-06-28)
Generalized operator-precedence parsing joachim_d@gmx.de (Joachim Durchholz) (2001-07-06)
Re: Generalized operator-precedence parsing gtoomey@usa.net (Gregory Toomey) (2001-07-17)
Re: Generalized operator-precedence parsing David.Chase@naturalbridge.com (David Chase) (2001-07-17)
Re: Generalized operator-precedence parsing dmitry@elros.cbb-automation.de (2001-07-17)
Re: Generalized operator-precedence parsing peter@abbnm.com (2001-07-17)
Re: Generalized operator-precedence parsing slimick@jcs.upb.pitt.edu (2001-07-18)
Re: Generalized operator-precedence parsing mickunas@cs.uiuc.edu (Dennis Mickunas) (2001-07-23)
| List of all articles for this month |

From: David Chase <David.Chase@naturalbridge.com>
Newsgroups: comp.compilers
Date: 17 Jul 2001 23:21:54 -0400
Organization: Compilers Central
References: 01-07-053
Keywords: parse
Posted-Date: 17 Jul 2001 23:21:54 EDT



> Does anybody know about generalizations of operator-precedence parsing?
> Any information (preferrably available on the Internet) would be
> appreciated.


I saw your first post, meant to reply, but never quite got there. I'm
not 100% sure what you mean by "generalization". I thought your
syntax was a nice way to talk about prefix, postfix, infix, and
"surround-fix" expressions. I wrote an operator-precedence parser
(for FP84, in BCPL) that allowed the on-the-fly injection of new
operators, but your syntax for adding new operators is nicer.


I did find the (15 year old) source code listing for an OPP for
Fortran expressions. Sort of an interesting historical artifact,
though it isn't the finished product (in particular, it still has the
"all undeclared identifiers are undimensioned integers" stub in it).


Anyhow, I think your guesses about what to do with "surround-fix" are
roughly correct. The precedence of a closing parentheses is really
only interesting when there is an opening parentheses to match with
it, and at that point, it is as low as necessary to drive the enclosed
expressions to their evaluated form. There's obvious other
constraints that apply -- obviously, "[ ... )" (right here is where I
hit control-E to move to the end-of-line in Emacs, only I was
composing mail in Eudora. If I could disable that key, I would.) is
ill-formed, and should be rejected.


The generalization, something along the lines of


    IF exp THEN exp ELSE exp


I think has the the THEN first playing the role of ")", but afterwards
the IF ... THEN acts as an opening parentheses to the trailing ELSE,
followed finally by the expression. If you let users define such
syntax, you have to consider whether you let them also say "IF exp
THEN exp ..." (you can imagine this in a language with "no value" as
an outcome) and then ensure that


    IF exp THEN IF exp THEN exp ELSE exp


parses "properly", which (as I recall) is


    (IF exp THEN (IF exp THEN exp ELSE exp ) )


but I think this would work that way if you simply followed the
treatment of "IF ... THEN" as an opening parentheses for a trailing
ELSE -- it would bind to the closer one.


(Yes, you could have a real party with an operator precedence parser.)


I think you also have to be a little careful. I am not a parsing
expert, but what I have heard is that OPPs can accept larger-than-
intended languages. I don't recall the example, and (given that
parsing atrocities that we've seen since I didn't learn enough about
parsing) it's not clear that anyone cares. I do have one opinion that
I think is worth listening to, which is that C (and hence, C++, and
Java) have too many prefix operators. In particular, casting should
not be a prefix operator, it should be some sort of a suffix operator,
so that you can read an expression from left to right and not worry
about how the parentheses group or what the relative precedence of
casting and other things is.


I worry a bit, also, that you might find yourself wanting to (ahem)
associate productions and what-not with your operator specifications,
so that you might say something like


. + . = plus(.,.)


so then you'll want to start identifying those non-terminals


<expr_1> + <expr_2> := plus(expr_1,expr_2)


and then you'll probably want to start attaching other information to
them, like whether they are expressions or statements, along the lines
of


WHILE <expr_1> DO <statement_1> OD := choose_your_semantic_poison


so that you can forbid (assuming you think this is a good
idea, which I don't always) things like


    x := 1 + DO expression UNTIL done OD


(It would make more sense to forbid this if it is a zero-trip loop).


David Chase


--
David.Chase@NaturalBridge.com


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.