6 Dec 2006 08:58:33 -0500

Related articles |
---|

[10 earlier articles] |

Re: finding all dominator trees amichail@gmail.com (Amir Michail) (2006-12-03) |

Re: finding all dominator trees diablovision@yahoo.com (2006-12-04) |

Re: finding all dominator trees amichail@gmail.com (Amir Michail) (2006-12-05) |

Re: finding all dominator trees cfc@shell01.TheWorld.com (Chris F Clark) (2006-12-05) |

Re: finding all dominator trees martin@gkc.org.uk (Martin Ward) (2006-12-05) |

Re: finding all dominator trees mailbox@dmitry-kazakov.de (Dmitry A. Kazakov) (2006-12-06) |

Re: finding all dominator trees cfc@shell01.TheWorld.com (Chris F Clark) (2006-12-06) |

From: | Chris F Clark <cfc@shell01.TheWorld.com> |

Newsgroups: | comp.compilers |

Date: | 6 Dec 2006 08:58:33 -0500 |

Organization: | The World Public Access UNIX, Brookline, MA |

References: | 06-11-096 06-11-117 06-11-125 06-11-131 06-12-025 |

Keywords: | analysis |

First, my apologies to the group for so many posts on this topic,

which is actually only marginally connected to compilers, despite

being about dominators. If people ask, I will take this to private

email. However, given that graph problems are generally useful in

compilers, I will follow this a bit further.

Reading the web page, I realize that part of the reason for computing

dominators in your case is to find the sub-cycles (and that's often

true, often one cdomputes dominators to find cycles). Thus, an

algorithm that assumes you know the cycles is not very useful.

However, let's try to distnguish what is different about your case

from typical compiler users of the dominator algorithm. In the typical

compiler case, one often (perhaps even generally) has a basic linear

flow, that is there is some root "entrypoint" (or small set of roots)

and some small set of exits and the flow is from the entries to the

exits. Most flow is not bidirectional.

I think in a social networking case, one has a lot more bidirectional

flow. If I am your friend, you are my friend (and vice-versa). There

are exceptions, but it is common enough an occurance to alter the

basic model. This is particularly true of a social network that grows

from a group. People in the group become listed in the social network

to exploit their contacts. Thus, most people in the group are

connected, well at least for the intial (founding) group. Thus, the

whole graph is likely to be one big cycle (strongly-connected-

component), in that if there is a path from me to you, there is likely

a path from you back to me and most likely that path can be found by

simply reversing the edges that got from me to you in the first place.

Now, in the cases like facebook there will be certain (many)

exceptions to the rule, which I am an example of. I have a facebook

entry but I am connected to no one and I don't think anyone is

connected to me, because I did not join as part of a group of friends.

Moreover, if links get established, they may be more likely one-way

links, at least at first. So, in some (many) social networking

databases, there will be outlier vertexes, which are either disjoint

from the network, or sources (point into the network, but are not

pointed to) or sinks (the reverse). Note, because you are trying to

filter out "false" accounts, such vertexes may be particularly

interesting to identify. Any disjoint, source, or sink vertex is

likely to be a false account.

However, for the most part of the network, we can exclude such

isolated vertexes. In fact, I suspect that the body of the network

consists of one or more large cycles, and that the bulk of the network

is in fact one large cycle. Indeed, I'm pretty certain, I've ready a

Scientific American article (on how computer viruses spread) where the

author "proved" just such a point for nodes on the internet and their

connections. So, for simplicity, let's assume that any interesting

vertexs is part of this one large cycle. If the assumption is wrong,

one can repeat the algorithm starting at some vertex that was omitted

from the set already covered, and continue until all vertexes have

been covered.

At the same time, I think the bi-directional nature of most links will

also spoil use of a traditional dominator algorithm. Almost every

edge has a corresponding back-edge. I suspect, you are more likely to

see 3 element cycles of the form a<->b<->c, than a->b->c->a. That

does not mean their are not cutpoints or articulation points in the

graph, vertexes, which if removed the graph would be come disjoint.

It just means that I think the dominator algorithm isn't the best way

to find them, as there may be alternate paths from sub-clusters to

each other, but a small set of vertexes which if removed would cause

the graph to fall apart. In other words, just becasue a vertex isn't

a dominator to a sub-cluster of the graph, doesn't mean it isn't one

of the key vertexes which if removed would cause the graph to fall

apart. Again, I think such vertexes are called articulation points

and their are algorithms for finideng them, ande they would be more

useful starting points.

Note, if you still want to find dominators, I think that if you start

with any node in your graph, and find the dominators from that point,

while at the same time finding cycles, you will collect the info you

need (in roughly one pass over the graph).

Speculation: because of the special properties of social networking

graphs, I might be tempted to break (remove) back-edges caused by

bi-directional flows (e.g. a<->b) as I traverse an edge in either

direction. I can't tell you exactly how that will effect the

algorithm, but intuitively, it feels right. In fact, if easy to do, I

would process bi-directional edges before uni-directional ones. I

would even be willing to make a special pass to identify

bi-directional edges (and mark them) before running the normal

algorithm.

Hope this helps,

-Chris

*****************************************************************************

Chris Clark Internet : compres@world.std.com

Compiler Resources, Inc. Web Site : http://world.std.com/~compres

23 Bailey Rd voice : (508) 435-5016

Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

Post a followup to this message

Return to the
comp.compilers page.

Search the
comp.compilers archives again.