Re: Generalized operator-precedence parsing

dmitry@elros.cbb-automation.de (Dmitry Kazakov)
17 Jul 2001 23:22:58 -0400

          From comp.compilers

Related articles
Generalized operator-precedence parsing joachim_d@gmx.de (Joachim Durchholz) (2001-06-28)
Generalized operator-precedence parsing joachim_d@gmx.de (Joachim Durchholz) (2001-07-06)
Re: Generalized operator-precedence parsing gtoomey@usa.net (Gregory Toomey) (2001-07-17)
Re: Generalized operator-precedence parsing David.Chase@naturalbridge.com (David Chase) (2001-07-17)
Re: Generalized operator-precedence parsing dmitry@elros.cbb-automation.de (2001-07-17)
Re: Generalized operator-precedence parsing peter@abbnm.com (2001-07-17)
Re: Generalized operator-precedence parsing slimick@jcs.upb.pitt.edu (2001-07-18)
Re: Generalized operator-precedence parsing mickunas@cs.uiuc.edu (Dennis Mickunas) (2001-07-23)
| List of all articles for this month |

From: dmitry@elros.cbb-automation.de (Dmitry Kazakov)
Newsgroups: comp.compilers
Date: 17 Jul 2001 23:22:58 -0400
Organization: Compilers Central
References: 01-06-064
Keywords: parse
Posted-Date: 17 Jul 2001 23:22:58 EDT

On 28 Jun 2001 23:41:31 -0400, "Joachim Durchholz" <joachim_d@gmx.de>
wrote:


>I've been thinking about a generalized approach to operator-precedence
>parsing.
>
>The basic idea is that an operator consists of a sequence of operator
>tokens and operand places. Indicating operand places by dots, an
>operator name might look like this:
>
>Standard infix operator:
> . + .
>Prefix operators:
> - .
> sin .
>The most basic "circumfix" operator:
> ( . )
>A somewhat more complicated circumfix:
> quadratic solution ( . , . , . )
>Postfix operators:
> . !
> . ^
> . *
>Smalltalk-style operators (e.g. substring search):
> find . in .


A generalized expression may look like:


<prefix> <operand> <postfix> <infix> <prefix> <operand> ...


where <prefix> is a list of any prefix operators and left braces,
<infix> is either an infix operation or a comma, <postfix> is a list
of any postfix operators and right braces. A generalized expression
can be parsed using the standard twin stack algorithm.


The conventional concept of operator precedence tells that each
operator has a priority controlling the operator's association with
its operands. It can be extended by splitting the single operator's
priority into two parts - the left and the right priorities. The left
priority controls the operator association with the left operand. The
right priority does it for the right operand. In the expression a*b+c,
the result will be (a*b)+c if the left priority of the + is not
greater than the right priority of the *. The advantages of the
extended concept:


(o) It is very easy to have left or right associated operators. For
example, if + has the left priority lower than the right one,
a+b+c will be associated as a+(b+c).


(o) Assignment operator. In C the assignment operator has a lower
priority than +. Therefore, a=b+c means a=(b+c). But a+b=c+d has a
meaningless interpretation as (a+b)=(c+d). Assigning to =
two priorities: very high left and very low right we could achieve a
more natural result: a+(b=(c+d)).


(o) Unary operator association can be more flexible. For example,
-++a-- can be interpreted as -(++(a)--) or even as ++(-(a--)).


There are three general classes of the operator context:


(1) Prefix context. A prefix operator (unary plus) or a left brace may
appear in this context. The parser remains in this context until there
are prefix things recognized. Then it tries to recognize an operand
(either a literal or an identifier). Then it switches to 3


(2) Infix context. A binary operator (+, * and so on), a comma or a
left brace of an indexing operator may be here. The parser recognizes
an infix thing and switches to 1.


(3) Postfix context. A postfix operator (++ in the C) or a right
brace is recognized in this context. The parser remains in this
context until there is a postfix thing. Then it switches to 2.


There are the following kinds of generalized operators:


(1) Prefix operators are recognizedr in the prefix context. They are
unary plus and minus, preincrement and predecrement (in C). Usually,
the left priority of a prefix operator is low. The right one is high.
Therefore, a-++b-c means a-(++b)-c. Although sometimes the right
priority is lower than the left priority of an infix operator. For
example, in -a**b, that should be treated as -(a**b).


(2) Infix operators are most of operators, like plus, minus and
others. Normally, the left and right priorities of an infix operator
are near to each other (for operator association see above). Some
operators may have unbalanced priorities, like the assignment
operator. There could be two kinds of infix operators:


(o) Normal infix operators.
(o) Meta-infix operators that cannot appear outside of braces. In
other things they does not differ from particular operators. An
example of a meta-infix operator is => (keyword parameter association)
in Ada.


(3) Postfix operators are recognized in the postfix context. They are
postincrement and postdecrement operators in C. Usually, the left
priority is high, but lower than the right priorities of prefix
operators. The right operator's priority should be slightly lower than
the left one, but higher than the right priorities of the infix
operators. Under these conditions a-++b--+c will mean
a-(((--(++b))++)--)+c.


(4) Left braces are recognized in prefix and infix contexts.They are
subdivided into two classes:


(o) Simple left braces appearing in the prefix context. They have no
priorities because their order cannot be violated. They are left
grouping and left aggregate braces. For example, ( in (a+b)*c and { in
{1,2,3}.
(o) Index braces appear in the infix context. They are left braces of
function calls and array index. For example, [ in a+b[2]. Here the
index has two operands b and 2. Their left priority is usually high.
Otherwise, a+b[2] were interpreted as (a+b)[2]. The right priority is
absent and assumed to be infinite.


(5) Commas have no priorities because their order cannot be violated.
They appear in the infix context and are subdivided into four classes:


(o) Simple commas like `,' in foo (a,2,b).
(o) Comma-brace separates a parameter from a sublist of parameters
and looks like a comma followed by a left brace. For instance, in
Ada's extension aggregates, the keyword `with' is a comma-brace. So
the extension aggregate (a with 1,2,3) means (a,(1,2,3)). Where "a" is
a value of the base type of the extended type and (1,2,3) is the
aggregate of the base type extension. A comma-brace can only appear
within a parameter list.
(o) Brace-comma closes a parameter sublist opened by a comma-brace and
separates the next parameter of the parent parameter list.
(o) Brace-comma-brace closes a parameter sublist opened by a
comma-brace and opens a new sublist.


(6) Right braces appear in the postfix context. They close a parameter
list opened by a left braces as well as all its sublists.


Instead of priorities braces and commas should have the group and the
type. They are used to separate different classes of braces like ()
and [] which should have different types. They may have the same group
to share the same comma.


Regards,
Dmitry Kazakov


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.