Related articles |
---|
tips for writing regular expression interpreter/compiler? derekhaas@gmail.com (Derek Haas) (2005-11-26) |
Re: tips for writing regular expression interpreter/compiler? jburgy@gmail.com (2005-11-30) |
Re: tips for writing regular expression interpreter/compiler? rsc@swtch.com (Russ Cox) (2005-12-02) |
Re: tips for writing regular expression interpreter/compiler? jeffrey.kenton@comcast.net (Jeff Kenton) (2005-12-02) |
Re: tips for writing regular expression interpreter/compiler? mefrill@yandex.ru (mefrill) (2005-12-02) |
Re: tips for writing regular expression interpreter/compiler? markwh04@yahoo.com (2005-12-23) |
Re: tips for writing regular expression interpreter/compiler? markwh04@yahoo.com (2005-12-23) |
Re: tips for writing regular expression interpreter/compiler? rsc@swtch.com (Russ Cox) (2005-12-23) |
From: | markwh04@yahoo.com |
Newsgroups: | comp.compilers |
Date: | 23 Dec 2005 18:02:29 -0500 |
Organization: | http://groups.google.com |
References: | 05-11-119 |
Keywords: | lex |
Posted-Date: | 23 Dec 2005 18:02:29 EST |
Derek Haas wrote:
> I am trying to implement a small subset of the regular expression
> languages found in perl/python/et cetera (maybe just Kleene closure
> and alternation for now).
There should still be the demo software for RE->FA in the
comp.compilers archive. Along with each demo (each about 400-500 lines
of programming) are texts that provide a general description of the
algorithms used to make the conversions.
The dfa and nfa programs perform conversions as above. The rex program
is grep, souped up to handle general boolean combinations (A intersect
B, A - B) and interleave products (e.g. x^y^z = x(y^z) | y(z^x) |
z(x^y); x^y = xy | yx).
The simplest approach: just represent a regular expression as you would
any operator expression, (with the operators being A*, A+, A?, A|B, AB
and constants 0, 1 and S for character sets S); and perform operations
by doing algebraic computations on these expressions (that's how nfa,
dfa and rex work; rex does computations on a need-to-use basis while
reading through input, which is the same strategy most grep
implementations take).
What NFA does is reduce an expression to normal form:
0, 1, S normal form
A* -> 1 | A+ = normal form
A+ -> A A* -> ...
A? -> 1 | A = normal form
A | B = normal form
0A -> 0 = normal form
1A -> A -> ...
xA = normal form
A* B -> B | A+ B = normal form
A+ B -> A (A* B) -> ...
A? B -> B | A B = normal form
(A|B) C -> AC | BC = normal form
(AB) C -> A (BC) -> ...
Denoting the normal form of an expression A by NF(A), then when it
proceeds to do is compute sets by the following:
<0> = {}; <1> = {1}; <S> = {S 1}
<A | B> = <A> union <B>
<xA> = {xA}
<A> = <NF(A)>, if A is not in normal form.
In the process of finding these sets, repeats may occur (e.g. <1+> = <1
1*> = <1*> = <1 | 1+> = {1} union <1+>), which are tracked and absorbed
as they occur.
Each expression A produces a set <A> whose elements are either 1 or
other expressions of the form xB. This yields the rules (A final) <->
(1 is in <A>); (x: A -> B) <-> {xB} is in A).
The starting expression is equated to the start state; and the
processing is run on <S>, and then on any of the other expressions
spawned from this until the full automaton is determined.
For nfa, and dfa, the GREP subset syntax is not used for S; S may only
be a symbolic name. But they both support non-recursive definitions.
(If you allow recursive definitions, then you no longer had regular
expressions but context-free expressions).
Return to the
comp.compilers page.
Search the
comp.compilers archives again.