Related articles |
---|
Regexps from DFA gvmt@csa.iisc.ernet.in (G Venkatesha Murthy) (1997-02-02) |
Re: Regexps from DFA anton@a0.complang.tuwien.ac.at (1997-02-03) |
Re: Regexps from DFA dimock@deas.harvard.edu (1997-02-03) |
Re: Regexps from DFA clark@quarry.zk3.dec.com (1997-02-07) |
Re: Regexps from DFA mslamm@pluto.mscc.huji.ac.il (1997-02-07) |
Re: Regexps from DFA lijnzaad@ebi.ac.uk (Philip Lijnzaad) (1997-02-07) |
From: | Philip Lijnzaad <lijnzaad@ebi.ac.uk> |
Newsgroups: | comp.compilers |
Date: | 7 Feb 1997 23:36:24 -0500 |
Organization: | EMBL Outstation Hinxton - European Bioinformatics Institute, UK |
References: | 97-02-020 97-02-028 |
Keywords: | lex, DFA |
anton@a0.complang.tuwien.ac.at (Anton Ertl) writes:
> Perhaps what you are searching for would be accomplished by a tool
> like this: Given a set of example strings that are in your language
> ("in"), and a set of strings that are not in your language ("out"),
> return the simplest RE (according to some metric, e.g., number of RE
> operators) for a language that is a superset of the "in" set, and
> disjoint from the "out" set.
A related problem turns up in the efficient 'syntax-colouring'
(font-lock, hilit) in the emacs editor. Simon Marshall has written a
simple yet very nifty function that turns an RE consisting of the "OR"
of a list of words into a simplified and ('by the look of it') minimal
RE that recoginizes the same strings:
For example:
(make-regexp '("abort" "abs" "accept" "access" "array" "begin" "body" "case"
"constant" "declare" "delay" "delta" "digits" "else" "elsif"
"entry" "exception" "exit" "function" "generic" "goto" "if"
"others" "limited" "loop" "mod" "new" "null" "out" "subtype"
"package" "pragma" "private" "procedure" "raise" "range"
"record" "rem" "renames" "return" "reverse" "select"
"separate" "task" "terminate" "then" "type" "when" "while"
"with" "xor"))
yields:
"a(b(ort|s)|cce(pt|ss)|rray)|b(egin|ody)|c(ase|onstant)|d(e(clare|\
l(ay|ta))|igits)|e(ls(e|if)|ntry|x(ception|it))|function|g(eneric|\
oto)|if|l(imited|oop)|mod|n(ew|ull)|o(thers|ut)|p(ackage|r(agma|ivate|\
ocedure))|r(a(ise|nge)|e(cord|m|names|turn|verse))|s(e(lect|parate)|\
ubtype)|t(ask|erminate|hen|ype)|w(h(en|ile)|ith)|xor"
(lots of backslashes deleted to make it look like a lex-style RE). You
can find the code in make-regexp.el, which is part of the LCD (emacs
lisp archive). See
archive.cis.ohio-state.edu:/pub/gnu/emacs/elisp-archive; the code is
well documented.
Oh and BTW, one of the previous respondents was thinking about such a
facility for ls (1). The expansion of filenames (globbing), however,
is traditionally the task of the shell. Bash (GNU's Bourne Again
SHell) has the capablities the author was looking for (although it
can't do full RE's).
Philip
--
finger lijnzaad@ebi.ac.uk for PGP public key
fingerprint: E1 03 BF 80 94 61 B6 FC 50 3D 1F 64 40 75 FB 53
--
Return to the
comp.compilers page.
Search the
comp.compilers archives again.