23 Dec 2005 18:02:29 -0500

Related articles |
---|

tips for writing regular expression interpreter/compiler? derekhaas@gmail.com (Derek Haas) (2005-11-26) |

Re: tips for writing regular expression interpreter/compiler? jburgy@gmail.com (2005-11-30) |

Re: tips for writing regular expression interpreter/compiler? rsc@swtch.com (Russ Cox) (2005-12-02) |

Re: tips for writing regular expression interpreter/compiler? jeffrey.kenton@comcast.net (Jeff Kenton) (2005-12-02) |

Re: tips for writing regular expression interpreter/compiler? mefrill@yandex.ru (mefrill) (2005-12-02) |

Re: tips for writing regular expression interpreter/compiler? markwh04@yahoo.com (2005-12-23) |

Re: tips for writing regular expression interpreter/compiler? markwh04@yahoo.com (2005-12-23) |

Re: tips for writing regular expression interpreter/compiler? rsc@swtch.com (Russ Cox) (2005-12-23) |

From: | markwh04@yahoo.com |

Newsgroups: | comp.compilers |

Date: | 23 Dec 2005 18:02:29 -0500 |

Organization: | http://groups.google.com |

References: | 05-11-119 |

Keywords: | lex |

Derek Haas wrote:

*> I am trying to implement a small subset of the regular expression*

*> languages found in perl/python/et cetera (maybe just Kleene closure*

*> and alternation for now).*

There should still be the demo software for RE->FA in the

comp.compilers archive. Along with each demo (each about 400-500 lines

of programming) are texts that provide a general description of the

algorithms used to make the conversions.

The dfa and nfa programs perform conversions as above. The rex program

is grep, souped up to handle general boolean combinations (A intersect

B, A - B) and interleave products (e.g. x^y^z = x(y^z) | y(z^x) |

z(x^y); x^y = xy | yx).

The simplest approach: just represent a regular expression as you would

any operator expression, (with the operators being A*, A+, A?, A|B, AB

and constants 0, 1 and S for character sets S); and perform operations

by doing algebraic computations on these expressions (that's how nfa,

dfa and rex work; rex does computations on a need-to-use basis while

reading through input, which is the same strategy most grep

implementations take).

What NFA does is reduce an expression to normal form:

0, 1, S normal form

A* -> 1 | A+ = normal form

A+ -> A A* -> ...

A? -> 1 | A = normal form

A | B = normal form

0A -> 0 = normal form

1A -> A -> ...

xA = normal form

A* B -> B | A+ B = normal form

A+ B -> A (A* B) -> ...

A? B -> B | A B = normal form

(A|B) C -> AC | BC = normal form

(AB) C -> A (BC) -> ...

Denoting the normal form of an expression A by NF(A), then when it

proceeds to do is compute sets by the following:

<0> = {}; <1> = {1}; <S> = {S 1}

<A | B> = <A> union <B>

<xA> = {xA}

<A> = <NF(A)>, if A is not in normal form.

In the process of finding these sets, repeats may occur (e.g. <1+> = <1

1*> = <1*> = <1 | 1+> = {1} union <1+>), which are tracked and absorbed

as they occur.

Each expression A produces a set <A> whose elements are either 1 or

other expressions of the form xB. This yields the rules (A final) <->

(1 is in <A>); (x: A -> B) <-> {xB} is in A).

The starting expression is equated to the start state; and the

processing is run on <S>, and then on any of the other expressions

spawned from this until the full automaton is determined.

For nfa, and dfa, the GREP subset syntax is not used for S; S may only

be a symbolic name. But they both support non-recursive definitions.

(If you allow recursive definitions, then you no longer had regular

expressions but context-free expressions).

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.