Re: Implementing set theory operations similar to SQL's select and update

George Neuner <gneuner2@comcast.net>
25 Aug 2004 14:53:51 -0400

          From comp.compilers

Related articles
Implementing set theory operations similar to SQL's select and update barry_j_kelly@hotmail.com (2004-08-23)
Re: Implementing set theory operations similar to SQL's select and upd gneuner2@comcast.net (George Neuner) (2004-08-25)
Re: Implementing set theory operations similar to SQL's select and upd dmclean@stsci.edu (Donald F. McLean) (2004-08-25)
Re: Implementing set theory operations similar to SQL's select and upd kwheinri@bsr2.uwaterloo.ca (Kenn Heinrich) (2004-09-03)
| List of all articles for this month |

From: George Neuner <gneuner2@comcast.net>
Newsgroups: comp.compilers
Date: 25 Aug 2004 14:53:51 -0400
Organization: Compilers Central
References: 04-08-117
Keywords: analysis
Posted-Date: 25 Aug 2004 14:53:51 EDT

On 23 Aug 2004 12:07:59 -0400, barry_j_kelly@hotmail.com (Barry Kelly)
wrote:


>I'm writing an engine designed to implement behaviour which may be
>customized along multiple dimensions.
>
>1. My problem is, in essence, implementing a set-based language,
>although written through a GUI and probably using predicate expression
>evaluation trees rather than machine code for implementation (although
>that may be very slow, and that's what this post is about).
>
>I have a sequential list of multiple (1000+), overlapping set
>definitions. I want to know, quickly enough to be usable for a
>developer using a GUI front end, does the set they are defining
>completely encompass a previously defined set? Is it only a partial
>subset? What is that subset? Can that subset be converted easily into
>its own representation rather than as the product of set difference
>and/or intersection?
>
>Does anybody know of concrete papers, books, articles dealing with
>concrete implementation details and algorithms that I should be
>tracking down rather than the (thus far unsuccessful) general searches
>for "set theory"?
>


It's not clear from your problem description whether there is an
actual database or whether the rules define abstract sets. It really
doesn't matter - in order to evaluate the sets for equality you need
to have some kind of concrete representation for them.


The rulebase will be easier to evaluate if each rule stands alone as a
logical predicate. If this is not true for the human readable
representation (very likely 8-), you should internally reduce rule
chains to a single rule that describes the result directly. If
partial chain application is allowed, you should produce a separate
direct rule for each possible partial chain. In application, you
identify and apply the most specific rule.


Regarding the coverage problem - you will definitely want to avoid
creating a cartesian product on scads of dimensions ... instead you'll
want to order evaluation of the predicates in such a way as to produce
the smallest possible result set at each step. If each rule stands
alone, the evaluation order will not affect the result. Identify and
compute intersections before unions or joins to keep the intermediate
results as small as possible.


Using the database analogy, you don't need to actually create the
intermediate table at each step ... you only need to compute the
columns and estimate the number of rows (ie. a virtual table). At
each step, you want to identify and perform the operation that will
produce the table with the least number of result rows. You continue
until either a single table remains or all the remaining intermediate
tables are empty (a null query). The final table, when filled in with
data, is your optimized query.


For practical matters, you should look for papers on database query
optimization, solving equation and logic predicate systems, etc. You
should also read up on dynamic programming techniques if you aren't
familiar with them. I'm sorry I don't have any good cites to give you
- my work has taken me in other directions and I don't follow research
in these fields very closely any more.


AFA implementation - these kinds of problems are best solved in logic
or functional languages (or in languages that easily accomodate those
techniques). I personally like Lisp, but ML, Prolog or one of their
close variants would also work well. This is not the kind of problem
I would like to tackle in C++, Java or the language du jour.


George


Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.