Mon, 20 Apr 1992 17:21:16 GMT

Related articles |
---|

Do I need to invent a new type of parser? jeffk@ecst.csuchico.edu (Jeffery Alan Keasler) (1992-04-20) |

Re: Do I need to invent a new type of parser? Jan.Rekers@cwi.nl (1992-04-21) |

Re: Do I need to invent a new type of parser? jeffk@ecst.csuchico.edu (1992-04-22) |

Re: Do I need to invent a new type of parser? stephen@estragon.uchicago.edu (1992-04-26) |

Re: Do I need to invent a new type of parser? hagerman@ece.cmu.edu (1992-04-28) |

Re: Do I need to invent a new type of parser? stephen@estragon.uchicago.edu (1992-04-29) |

Re: Do I need to invent a new type of parser? anton@mips.complang.tuwien.ac.at (1992-05-04) |

[2 later articles] |

Newsgroups: | comp.compilers |

From: | Jeffery Alan Keasler <jeffk@ecst.csuchico.edu> |

Keywords: | parse, question |

Organization: | Compilers Central |

Date: | Mon, 20 Apr 1992 17:21:16 GMT |

I am currently designing a new language but I can't proceed with my

work until I improve my parser. Below, I discuss a few of my problems.

If anyone has seen some relevant literature, email would be most

appreciated.

The problem...

For simplicity sake, let's say that we have a language with only one

class/type/mode, namely 'integer'. Also, we have only one operator,

namely the '+' operator. Let us assume that the '+' operator is

overloaded for prefix, infix, and postfix operations.

I would like to be able to parse

a + b c +

without any semicolons or grouping symbols. I would like the parser to

see this expression as one of the following, depending on the

circumstances:

E1 ( ( a + b ) c + ) - if infix is of highest precedence

or all operators are of equal precedence

E2 ( a ( + b c ) + ) - if prefix is of highest precedence

E3 ( a + ( b c + ) ) - if postfix is of highest precedence

I believe that the parser would have the most difficulty parsing E3

because, initially, the parser believes that it's found the E1 grouping.

Only after it scans further does the parser realize that it must dismantle

E1 and build E3.

Next example; if we have three productions as follows (note the addition

of the tokens '(' and ')' to our language):

( A + B + C ) - the tokens '(', ')', and '+' are terminals.

A, B, and C are nonterminals.

D + E

( F )

and "( A + B + C )" is of higher or equal precedence to "D + E" then one

of several things can occur when parsing the phrase "( g + h + i + j )"

(assuming we shift on shift/reduce conflicts):

E4 ( g + h + ( i + j ) ) - If "( A + B + C )" is strictly right associative

E5 ( g + ( h + i ) + j ) - If "( A + B + C )" is middle associative

E6 ( ( g + h ) + i + j ) - If "( A + B + C )" is strictly left associative

If the production "D + E" is of higher precedence, the result would be:

E7 ( ( ( g + h ) + i ) + j ) - production "( F )" and "D + E"

In general, my problem can be stated as follows; input to the parser

comes as a stream of characters. This stream can be grouped into tokens.

These tokens can be grouped into productions which have the form:

e1 t1 e2 t2 ...

where e represents a group of nonterminals and t represents a group of

terminals. An associativity ranking is assigned to each expression group.

These productions can be overloaded. Overloading occurs when the token

groups remain identical in two or more productions, while the expression

groups change in some unique way. i.e. varying the length of expression

groups, changing the class/type/mode of expressions, etc.

I am especially interested in methods of handling operators which have

greater than dyadic associativity. I would also like to know about any

good papers available concerning efficient methods of handling arbitrarily

long lookahead under heavy overloading of operators.

[I'd think that you ought to be able to tweak old-fashioned operator

precedence to handle this. But it does look like a language design approach

that will rival Intercal in readability and ease of use. -John]

--

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.