|Do I need to invent a new type of parser? firstname.lastname@example.org (Jeffery Alan Keasler) (1992-04-20)|
|Re: Do I need to invent a new type of parser? Jan.Rekers@cwi.nl (1992-04-21)|
|Re: Do I need to invent a new type of parser? email@example.com (1992-04-22)|
|Re: Do I need to invent a new type of parser? firstname.lastname@example.org (1992-04-26)|
|Re: Do I need to invent a new type of parser? email@example.com (1992-04-28)|
|Re: Do I need to invent a new type of parser? firstname.lastname@example.org (1992-04-29)|
|Re: Do I need to invent a new type of parser? email@example.com (1992-05-04)|
|[2 later articles]|
|From:||Jeffery Alan Keasler <firstname.lastname@example.org>|
|Date:||Mon, 20 Apr 1992 17:21:16 GMT|
I am currently designing a new language but I can't proceed with my
work until I improve my parser. Below, I discuss a few of my problems.
If anyone has seen some relevant literature, email would be most
For simplicity sake, let's say that we have a language with only one
class/type/mode, namely 'integer'. Also, we have only one operator,
namely the '+' operator. Let us assume that the '+' operator is
overloaded for prefix, infix, and postfix operations.
I would like to be able to parse
a + b c +
without any semicolons or grouping symbols. I would like the parser to
see this expression as one of the following, depending on the
E1 ( ( a + b ) c + ) - if infix is of highest precedence
or all operators are of equal precedence
E2 ( a ( + b c ) + ) - if prefix is of highest precedence
E3 ( a + ( b c + ) ) - if postfix is of highest precedence
I believe that the parser would have the most difficulty parsing E3
because, initially, the parser believes that it's found the E1 grouping.
Only after it scans further does the parser realize that it must dismantle
E1 and build E3.
Next example; if we have three productions as follows (note the addition
of the tokens '(' and ')' to our language):
( A + B + C ) - the tokens '(', ')', and '+' are terminals.
A, B, and C are nonterminals.
D + E
( F )
and "( A + B + C )" is of higher or equal precedence to "D + E" then one
of several things can occur when parsing the phrase "( g + h + i + j )"
(assuming we shift on shift/reduce conflicts):
E4 ( g + h + ( i + j ) ) - If "( A + B + C )" is strictly right associative
E5 ( g + ( h + i ) + j ) - If "( A + B + C )" is middle associative
E6 ( ( g + h ) + i + j ) - If "( A + B + C )" is strictly left associative
If the production "D + E" is of higher precedence, the result would be:
E7 ( ( ( g + h ) + i ) + j ) - production "( F )" and "D + E"
In general, my problem can be stated as follows; input to the parser
comes as a stream of characters. This stream can be grouped into tokens.
These tokens can be grouped into productions which have the form:
e1 t1 e2 t2 ...
where e represents a group of nonterminals and t represents a group of
terminals. An associativity ranking is assigned to each expression group.
These productions can be overloaded. Overloading occurs when the token
groups remain identical in two or more productions, while the expression
groups change in some unique way. i.e. varying the length of expression
groups, changing the class/type/mode of expressions, etc.
I am especially interested in methods of handling operators which have
greater than dyadic associativity. I would also like to know about any
good papers available concerning efficient methods of handling arbitrarily
long lookahead under heavy overloading of operators.
[I'd think that you ought to be able to tweak old-fashioned operator
precedence to handle this. But it does look like a language design approach
that will rival Intercal in readability and ease of use. -John]
Return to the
Search the comp.compilers archives again.