Mon, 30 Dec 2019 16:34:14 -0800 (PST)

Related articles |
---|

Algebraic RegEx toolbase with a grep superset clone. rockbrentwood@gmail.com (2019-12-30) |

From: | rockbrentwood@gmail.com |

Newsgroups: | comp.compilers |

Date: | Mon, 30 Dec 2019 16:34:14 -0800 (PST) |

Organization: | Compilers Central |

Injection-Info: | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="53472"; mail-complaints-to="abuse@iecc.com" |

Keywords: | tools, lex |

Posted-Date: | 30 Dec 2019 20:12:12 EST |

Regular expressions and finite state automata using Kleene algebraic

techniques.

DFA: converts extended regular expressions to minimal deterministic finite

state automata

NFA: converts regular expressions to small non-deterministic finite state

automata

FSC: a "finite state classifier" infers a classification delimited by mutually

disjoint regular expressions from a set of exemplars

REX: a GREP-like utility that works with extended regular expressions.

The "automata" are listed in algebraic form as a system of equations -

equivalent to their formulation as a regular grammar.

In addition to the standard Kleene algebraic operations ("|" for union, "*"

for 0,1,2+ iterations, "+" for 1,2+ iterations, "?" for 0,1 iterations),

extended regular expressions include "&" for intersection, "-" for relative

compliment and "^" for interleave. Formerly archived in the comp.compilers

archive

Formerly in the comp.compilers archive in the 1990's, it's on GitHub here

now:

https://github.com/RockBrentwood/RegEx

The programs were written in ANSI-C (and have been updated to C99). The

programs are small: at present, 154 lines (fsc.c), 365 lines (nfa.c), 483

lines (dfa.c), 615 lines (rex.c). The build is set up for POSIX-based systems.

There are legacy hooks in rex.c to handle wildcard expansions for DOS/Windows,

but they've not been recently tested.

The underlying algebraic framework for regular expressions has been extended

up the Chomsky hierarchy to type 2 - context-free expressions. Two

mathematical formulations

* one, based on the fixed-point operator, descending from the earlier 1970's

Grusky/McWhirter/Yntema, expanded by Leiss/Esik/Kozen in the 2000's and

2010's;

* the other described here and elsewhere in the 1990's, whose underlying

theory - an Eilenberg-Moore category - was published in 2008

were proven equivalent in 2018.

In the future, the RegEx library will be expanded to embody context-free

expressions.

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.