Generalized operator-precedence parsing

"Joachim Durchholz" <joachim_d@gmx.de>
28 Jun 2001 23:41:31 -0400

          From comp.compilers

Related articles
Generalized operator-precedence parsing joachim_d@gmx.de (Joachim Durchholz) (2001-06-28)
Generalized operator-precedence parsing joachim_d@gmx.de (Joachim Durchholz) (2001-07-06)
Re: Generalized operator-precedence parsing gtoomey@usa.net (Gregory Toomey) (2001-07-17)
Re: Generalized operator-precedence parsing David.Chase@naturalbridge.com (David Chase) (2001-07-17)
Re: Generalized operator-precedence parsing dmitry@elros.cbb-automation.de (2001-07-17)
Re: Generalized operator-precedence parsing peter@abbnm.com (2001-07-17)
Re: Generalized operator-precedence parsing slimick@jcs.upb.pitt.edu (2001-07-18)
[1 later articles]
| List of all articles for this month |

From: "Joachim Durchholz" <joachim_d@gmx.de>
Newsgroups: comp.compilers
Date: 28 Jun 2001 23:41:31 -0400
Organization: Compilers Central
Keywords: parse, question
Posted-Date: 28 Jun 2001 23:41:31 EDT

Hi all,


I've been thinking about a generalized approach to operator-precedence
parsing.


The basic idea is that an operator consists of a sequence of operator
tokens and operand places. Indicating operand places by dots, an
operator name might look like this:


Standard infix operator:
    . + .
Prefix operators:
    - .
    sin .
The most basic "circumfix" operator:
    ( . )
A somewhat more complicated circumfix:
    quadratic solution ( . , . , . )
Postfix operators:
    . !
    . ^
    . *
Smalltalk-style operators (e.g. substring search):
    find . in .


I'd like to see any pointers to literature (preferrably online) with
algorithms, experience reports, limits of the approach.


I already know some restrictions. For example, the grammar will become
ambiguous if a prefix and a postfix operator live at the same
precedence level, or if operator tokens are allowed to live in
operators of more than one precedence level. There's also the "core"
of an operator (everything except any initial or trailing operand
place). The operator core is its "circumfix" part, it's prefix if it
has a trailing operand, postfix with an initial operand, and infix if
it has an initial and a trailing operand.


Other things aren't so clear at all.
I'm pretty sure that a prefix and a right-associative infix operator on
the same precedence level generate an ambiguity, but I'm not sure why
(and I tend to confuse the association of fixity and associativeness).
I don't know what precedence to associate with the operand places within
the core of the operator, or how much sense precedence makes at all in
this context. The "inside" of a parenthese has lowest precedence, but is
it a good idea to make this a general rule?


Nesting Smalltalk-style operators will require lots of parentheses. Is
this avoidable? Or, more precisely: is there a way to reduce the number
of required parenthese without encouraging unreadable code? (I suspect
there isn't, but I may be wrong.)


Regards,
Joachim


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.