Re: finding all dominator trees

Chris F Clark <cfc@shell01.TheWorld.com>
6 Dec 2006 08:58:33 -0500

From comp.compilers

Related articles
[10 earlier articles]
Re: finding all dominator trees amichail@gmail.com (Amir Michail) (2006-12-03)
Re: finding all dominator trees diablovision@yahoo.com (2006-12-04)
Re: finding all dominator trees amichail@gmail.com (Amir Michail) (2006-12-05)
Re: finding all dominator trees cfc@shell01.TheWorld.com (Chris F Clark) (2006-12-05)
Re: finding all dominator trees martin@gkc.org.uk (Martin Ward) (2006-12-05)
Re: finding all dominator trees mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2006-12-06)
*Re: finding all dominator trees cfc@shell01.TheWorld.com (Chris F Clark)* (2006-12-06)**

| List of all articles for this month |

From:	Chris F Clark <cfc@shell01.TheWorld.com>
Newsgroups:	comp.compilers
Date:	6 Dec 2006 08:58:33 -0500
Organization:	The World Public Access UNIX, Brookline, MA
References:	06-11-096 06-11-117 06-11-125 06-11-131 06-12-025
Keywords:	analysis
Posted-Date:	06 Dec 2006 08:58:33 EST

First, my apologies to the group for so many posts on this topic,
which is actually only marginally connected to compilers, despite
being about dominators. If people ask, I will take this to private
email. However, given that graph problems are generally useful in
compilers, I will follow this a bit further.

Reading the web page, I realize that part of the reason for computing
dominators in your case is to find the sub-cycles (and that's often
true, often one cdomputes dominators to find cycles). Thus, an
algorithm that assumes you know the cycles is not very useful.

However, let's try to distnguish what is different about your case
from typical compiler users of the dominator algorithm. In the typical
compiler case, one often (perhaps even generally) has a basic linear
flow, that is there is some root "entrypoint" (or small set of roots)
and some small set of exits and the flow is from the entries to the
exits. Most flow is not bidirectional.

I think in a social networking case, one has a lot more bidirectional
flow. If I am your friend, you are my friend (and vice-versa). There
are exceptions, but it is common enough an occurance to alter the
basic model. This is particularly true of a social network that grows
from a group. People in the group become listed in the social network
to exploit their contacts. Thus, most people in the group are
connected, well at least for the intial (founding) group. Thus, the
whole graph is likely to be one big cycle (strongly-connected-
component), in that if there is a path from me to you, there is likely
a path from you back to me and most likely that path can be found by
simply reversing the edges that got from me to you in the first place.

Now, in the cases like facebook there will be certain (many)
exceptions to the rule, which I am an example of. I have a facebook
entry but I am connected to no one and I don't think anyone is
connected to me, because I did not join as part of a group of friends.
Moreover, if links get established, they may be more likely one-way
links, at least at first. So, in some (many) social networking
databases, there will be outlier vertexes, which are either disjoint
from the network, or sources (point into the network, but are not
pointed to) or sinks (the reverse). Note, because you are trying to
filter out "false" accounts, such vertexes may be particularly
interesting to identify. Any disjoint, source, or sink vertex is
likely to be a false account.

However, for the most part of the network, we can exclude such
isolated vertexes. In fact, I suspect that the body of the network
consists of one or more large cycles, and that the bulk of the network
is in fact one large cycle. Indeed, I'm pretty certain, I've ready a
Scientific American article (on how computer viruses spread) where the
author "proved" just such a point for nodes on the internet and their
connections. So, for simplicity, let's assume that any interesting
vertexs is part of this one large cycle. If the assumption is wrong,
one can repeat the algorithm starting at some vertex that was omitted
from the set already covered, and continue until all vertexes have
been covered.

At the same time, I think the bi-directional nature of most links will
also spoil use of a traditional dominator algorithm. Almost every
edge has a corresponding back-edge. I suspect, you are more likely to
see 3 element cycles of the form a<->b<->c, than a->b->c->a. That
does not mean their are not cutpoints or articulation points in the
graph, vertexes, which if removed the graph would be come disjoint.
It just means that I think the dominator algorithm isn't the best way
to find them, as there may be alternate paths from sub-clusters to
each other, but a small set of vertexes which if removed would cause
the graph to fall apart. In other words, just becasue a vertex isn't
a dominator to a sub-cluster of the graph, doesn't mean it isn't one
of the key vertexes which if removed would cause the graph to fall
apart. Again, I think such vertexes are called articulation points
and their are algorithms for finideng them, ande they would be more
useful starting points.

Note, if you still want to find dominators, I think that if you start
with any node in your graph, and find the dominators from that point,
while at the same time finding cycles, you will collect the info you
need (in roughly one pass over the graph).

Speculation: because of the special properties of social networking
graphs, I might be tempted to break (remove) back-edges caused by
bi-directional flows (e.g. a<->b) as I traverse an edge in either
direction. I can't tell you exactly how that will effect the
algorithm, but intuitively, it feels right. In fact, if easy to do, I
would process bi-directional edges before uni-directional ones. I
would even be willing to make a special pass to identify
bi-directional edges (and mark them) before running the normal
algorithm.

Hope this helps,
-Chris

*****************************************************************************
Chris Clark Internet : compres@world.std.com
Compiler Resources, Inc. Web Site : http://world.std.com/~compres
23 Bailey Rd voice : (508) 435-5016
Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

Post a followup to this message

Return to the comp.compilers page.
Search the comp.compilers archives again.

Re: finding all dominator trees

Chris F Clark <cfc@shell01.TheWorld.com>6 Dec 2006 08:58:33 -0500

Chris F Clark <cfc@shell01.TheWorld.com>
6 Dec 2006 08:58:33 -0500