Re: computing the first set (concretely)

Hans-Peter Diettrich <DrDiettrich@compuserve.de>12 Mar 2006 13:55:10 -0500

From comp.compilers

Related articles
computing the first set (concretely) aegis@mad.scientist.com (aegis) (2006-03-11)
Re: computing the first set (concretely) DrDiettrich@compuserve.de (Hans-Peter Diettrich) (2006-03-12)
Re: computing the first set (concretely) wyrmwif@tsoft.org (SM Ryan) (2006-03-12)
| List of all articles for this month |

 From: Hans-Peter Diettrich Newsgroups: comp.compilers Date: 12 Mar 2006 13:55:10 -0500 Organization: Compilers Central References: 06-03-031 Keywords: parse Posted-Date: 12 Mar 2006 13:55:10 EST

aegis wrote:

> I was thinking along the lines of a two dimensional matrix by where
> the rows are indexed by a particular nonterminal to find all terminals
> for the given production.
>
> It would look something like:
>
> int table[X][Y] = { { ... } ,
> { expr, '+', digit },
> { 0, 1, 2, 3, 4, 5 },

{ '0', '1', '2', '3', '4', '5' },

> { ... }
> };
>
> and where my specification would look like:
>
> ...
> expr: expr '+' digit ;
> digit: 0 | 1 | 2 | 3 | 4 | 5 ;
The same here.

You see that you'll have to store both terminals and nonterminals, and
possibly meta-operators, in the same array. IMO a tree structure were
more appropriate, with dedicated node types for sequences and
alternatives etc.

For the data representation I can see two somewhat equivalent solutions:

1) Use a common data structure (or base class) for the representation of
both terminals and nonterminals. Then you can use arrays of pointers to
that base structure, for all your representations.

2) Create a common enum for all elements, and use arrays of that enum.

The pointer solution doesn't require the construction of the enum(s),
but it might be harder to debug, depending on your debugger. You can use
it to create the enums, for use in a subsequent program, if you like.

> so then I can compute the set of terminals that would appear first
> when applying some production. (my non terminals could be represented
> with enumerated constants produced by enum)

For the representation of sets I found two solutions, with either bit
arrays (based on enums), or with lists of references (pointers or
indices). Here again lists are easier to handle, since they don't
require the construction of enums, and they even may require less memory
than fixed size bit sets. The lists here must not necessarily be linked
lists, dynamically reallocatable arrays will do the same job quite well.

> I'm looking for feedback from people who have implemented a FIRST()
> operator. Is there a general way of doing it? Am I on the wrong path?

You'll find models in every parser generator, and in many books about
languages and compilers. I'd use an object oriented approach, for best
extensibility whenever you hit the limits of your design and
implementation.

DoDi

Post a followup to this message