2 Jul 2003 00:38:49 -0400

Related articles |
---|

FIRST function computation problem. mefrill@yandex.ru (2003-06-22) |

Re: FIRST function computation problem. derkgwen@HotPOP.com (Derk Gwen) (2003-07-02) |

Re: FIRST function computation problem. torbenm@diku.dk (2003-07-02) |

Re: FIRST function computation problem. haberg@matematik.su.se (2003-07-02) |

Re: FIRST function computation problem. soenke.kannapinn@wincor-nixdorf.com (=?Windows-1252?Q?S=F6nke_Kannapinn?=) (2003-07-02) |

From: | torbenm@diku.dk (Torben Ęgidius Mogensen) |

Newsgroups: | comp.compilers |

Date: | 2 Jul 2003 00:38:49 -0400 |

Organization: | Department of Computer Science, University of Copenhagen |

References: | 03-06-103 |

Keywords: | parse |

Posted-Date: | 02 Jul 2003 00:38:49 EDT |

mefrill@yandex.ru (Vladimir) writes:

*> Hi,*

*>*

*> I have a problem with FIRST function for 1 lookahead symbol evalation.*

*> To lookup such the symbols for some nonterminal A I am doning follows:*

*>*

*> 1. go through all rules that left part is A. Here I do following:*

*> a) If I have rule A --> aB, where a is terminal. I just add this*

*> terminal to lookahead set for A.*

*> b) If I have rule A --> e, where e is epsilon, I add e to lookahead*

*> set for A.*

*> c) If I have rule A --> B C ... where B - nonterminal, I at first*

*> compute the lookahead set for B, if B's lookahead set contains*

*> epsilon, check C and etc. I found that recursive function here is the*

*> best. But I do not know how to work with rules like follows: A --> B,*

*> B --> C A D... To compute lookahead set for A I have to at first*

*> compute lookaheads for B, but if lookaheads for C contain epsilon I*

*> need to know the same about A to understand do checking of D or*

*> not...*

*>*

*> Please, help to decide this problem and there may be the good FIRTS*

*> computaion algorithm?*

In "The Dragon Book" and many other compiler books, the FIRST sets

include epsilon symbol. This has, however, a different meaning than

the other symbols in the FIRST sets: A terminal symbol in the FIRST

set of a nonterminal (or, more generally, a sequence of grammar

symbols) means that a string derived from the nonterminal (symbol

sequence) can start with that symbol. Epsilon, on the other hand.

Means that the nonterminal (symbol sequence) can derive the empty

string. Additionally, there is a type mismatch since epsilon isn't a

terminal symbol.

I much prefer the more modern approach of separating the FIRST

function into two functions: A "Nullable" function, that for each

nonterminal or sequence of grammar symbols determines if this can

derive the empty string, and a "First" function that maps a

nonterminal (or symbol sequence) to the set of terminal symbols that

can start strings derived from the nonterminal (or sequence).

You first find Nullable, which is solvable as a set of Horn-clause

constraints and then you find First, which is a set of set equations,

each of which may depend on the results of the Nullable calculation.

This approach is taken in, e.g., Andrew Appels "Modern Compiler

Implementation in <L>" (where <L> can be "ML", "Java" or "C").

You can find details of implementation in these books too.

Torben Mogensen

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.