Re: FIRST function computation problem.

torbenm@diku.dk (Torben Ęgidius Mogensen)
2 Jul 2003 00:38:49 -0400

          From comp.compilers

Related articles
FIRST function computation problem. mefrill@yandex.ru (2003-06-22)
Re: FIRST function computation problem. derkgwen@HotPOP.com (Derk Gwen) (2003-07-02)
Re: FIRST function computation problem. torbenm@diku.dk (2003-07-02)
Re: FIRST function computation problem. haberg@matematik.su.se (2003-07-02)
Re: FIRST function computation problem. soenke.kannapinn@wincor-nixdorf.com (=?Windows-1252?Q?S=F6nke_Kannapinn?=) (2003-07-02)
| List of all articles for this month |

From: torbenm@diku.dk (Torben Ęgidius Mogensen)
Newsgroups: comp.compilers
Date: 2 Jul 2003 00:38:49 -0400
Organization: Department of Computer Science, University of Copenhagen
References: 03-06-103
Keywords: parse
Posted-Date: 02 Jul 2003 00:38:49 EDT

mefrill@yandex.ru (Vladimir) writes:


> Hi,
>
> I have a problem with FIRST function for 1 lookahead symbol evalation.
> To lookup such the symbols for some nonterminal A I am doning follows:
>
> 1. go through all rules that left part is A. Here I do following:
> a) If I have rule A --> aB, where a is terminal. I just add this
> terminal to lookahead set for A.
> b) If I have rule A --> e, where e is epsilon, I add e to lookahead
> set for A.
> c) If I have rule A --> B C ... where B - nonterminal, I at first
> compute the lookahead set for B, if B's lookahead set contains
> epsilon, check C and etc. I found that recursive function here is the
> best. But I do not know how to work with rules like follows: A --> B,
> B --> C A D... To compute lookahead set for A I have to at first
> compute lookaheads for B, but if lookaheads for C contain epsilon I
> need to know the same about A to understand do checking of D or
> not...
>
> Please, help to decide this problem and there may be the good FIRTS
> computaion algorithm?


In "The Dragon Book" and many other compiler books, the FIRST sets
include epsilon symbol. This has, however, a different meaning than
the other symbols in the FIRST sets: A terminal symbol in the FIRST
set of a nonterminal (or, more generally, a sequence of grammar
symbols) means that a string derived from the nonterminal (symbol
sequence) can start with that symbol. Epsilon, on the other hand.
Means that the nonterminal (symbol sequence) can derive the empty
string. Additionally, there is a type mismatch since epsilon isn't a
terminal symbol.


I much prefer the more modern approach of separating the FIRST
function into two functions: A "Nullable" function, that for each
nonterminal or sequence of grammar symbols determines if this can
derive the empty string, and a "First" function that maps a
nonterminal (or symbol sequence) to the set of terminal symbols that
can start strings derived from the nonterminal (or sequence).


You first find Nullable, which is solvable as a set of Horn-clause
constraints and then you find First, which is a set of set equations,
each of which may depend on the results of the Nullable calculation.


This approach is taken in, e.g., Andrew Appels "Modern Compiler
Implementation in <L>" (where <L> can be "ML", "Java" or "C").


You can find details of implementation in these books too.


Torben Mogensen


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.