Re: Regexps from DFA

Philip Lijnzaad <lijnzaad@ebi.ac.uk>
7 Feb 1997 23:36:24 -0500

          From comp.compilers

Related articles
Regexps from DFA gvmt@csa.iisc.ernet.in (G Venkatesha Murthy) (1997-02-02)
Re: Regexps from DFA anton@a0.complang.tuwien.ac.at (1997-02-03)
Re: Regexps from DFA dimock@deas.harvard.edu (1997-02-03)
Re: Regexps from DFA clark@quarry.zk3.dec.com (1997-02-07)
Re: Regexps from DFA mslamm@pluto.mscc.huji.ac.il (1997-02-07)
Re: Regexps from DFA lijnzaad@ebi.ac.uk (Philip Lijnzaad) (1997-02-07)
| List of all articles for this month |

From: Philip Lijnzaad <lijnzaad@ebi.ac.uk>
Newsgroups: comp.compilers
Date: 7 Feb 1997 23:36:24 -0500
Organization: EMBL Outstation Hinxton - European Bioinformatics Institute, UK
References: 97-02-020 97-02-028
Keywords: lex, DFA

anton@a0.complang.tuwien.ac.at (Anton Ertl) writes:


> Perhaps what you are searching for would be accomplished by a tool
> like this: Given a set of example strings that are in your language
> ("in"), and a set of strings that are not in your language ("out"),
> return the simplest RE (according to some metric, e.g., number of RE
> operators) for a language that is a superset of the "in" set, and
> disjoint from the "out" set.


A related problem turns up in the efficient 'syntax-colouring'
(font-lock, hilit) in the emacs editor. Simon Marshall has written a
simple yet very nifty function that turns an RE consisting of the "OR"
of a list of words into a simplified and ('by the look of it') minimal
RE that recoginizes the same strings:


For example:


  (make-regexp '("abort" "abs" "accept" "access" "array" "begin" "body" "case"
                                "constant" "declare" "delay" "delta" "digits" "else" "elsif"
                                "entry" "exception" "exit" "function" "generic" "goto" "if"
                                "others" "limited" "loop" "mod" "new" "null" "out" "subtype"
                                "package" "pragma" "private" "procedure" "raise" "range"
                                "record" "rem" "renames" "return" "reverse" "select"
                                "separate" "task" "terminate" "then" "type" "when" "while"
                                "with" "xor"))
yields:


          "a(b(ort|s)|cce(pt|ss)|rray)|b(egin|ody)|c(ase|onstant)|d(e(clare|\
          l(ay|ta))|igits)|e(ls(e|if)|ntry|x(ception|it))|function|g(eneric|\
          oto)|if|l(imited|oop)|mod|n(ew|ull)|o(thers|ut)|p(ackage|r(agma|ivate|\
          ocedure))|r(a(ise|nge)|e(cord|m|names|turn|verse))|s(e(lect|parate)|\
          ubtype)|t(ask|erminate|hen|ype)|w(h(en|ile)|ith)|xor"


(lots of backslashes deleted to make it look like a lex-style RE). You
can find the code in make-regexp.el, which is part of the LCD (emacs
lisp archive). See
archive.cis.ohio-state.edu:/pub/gnu/emacs/elisp-archive; the code is
well documented.


Oh and BTW, one of the previous respondents was thinking about such a
facility for ls (1). The expansion of filenames (globbing), however,
is traditionally the task of the shell. Bash (GNU's Bourne Again
SHell) has the capablities the author was looking for (although it
can't do full RE's).




                                                                                                                                      Philip


--
finger lijnzaad@ebi.ac.uk for PGP public key
fingerprint: E1 03 BF 80 94 61 B6 FC 50 3D 1F 64 40 75 FB 53
--


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.