Database

Discovering Relationships in Context

By Joe Celko, June 30, 2006

Cogito is a graph-based relationship analytics tool for pattern matching and relationship identification, making it an ideal tool for computer forensics.

Finding the Shortest Path

A path through a graph is a traversal of consecutive nodes along a sequence of edges. Clearly, the node at the end of one edge in the sequence must also be the node at the beginning of the next edge in the sequence. The length of the path is the number of edges that are traversed along the path. A path does not have a cycle in it.

In this particular problem, the nodes are actors and the edges are the "in a movie with," "directed by," "costarred," and other such relationships.

Here, I'm looking for a path from "Kevin Bacon" to some other actor that has a length less than six. Actually, what I would really like is the shortest path (most direct relationship) within the set of all paths between those two actors.

The advantage of SQL as a database language is that it is a declarative, set-oriented language. I tell the SQL engine what I want to get back and it figures out how to do it. An optimizer looks at statistics, indexes, table sizes, and dozens of other things to get an execution plan. The optimizer is usually smarter than human beings.

But suddenly, this is not the approach we want with a graph. When I specify a rule for a path, I have to know the length (n) of the path I am looking for before I can write the query. Then when I run this query, I get all the paths in the set. Looking for this set can be a combinatorial explosion.

What I really wanted was the shortest path between my two actors. This means I have to start with paths of length=1, length=2, and so forth, until I either find a path or have searched all the edges in the graph. Example 2 is a template for a simple, two-edge path starting at Kevin Bacon.

SELECT M1.end_actor_id, 
        M1.relationship_name  
       M1.end_actor_id, 
        M2.relationship_name,  
       M2.end_actor_name 
  FROM MovieCasts AS M1, MovieCasts AS M2
 WHERE M1.begin_actor_name = 'Kevin Bacon'
   AND M1.end_actor_id = M2.begin_actor_id 
   AND M2.end_actor_id = 'Joe Celko';

Example 2: Two-edge path.

To extend this pattern, we alter the table by adding a column for the next relationship and one for the next actor. The important part is to also add a predicate that prevents cycles. That is, the next step in the path cannot already be in the unaltered table.

Notice in Example 3 that you have to redo the query for every path length for every pair of actors. If there is no relation between the two actors, you waste a lot of time and storage space, assuming that the problem doesn't overflow the limits of your database engine. Luckily, there's a tool that handles this kind of problem.

   AND M1.end_actor_id = M2.begin_actor_id;  
   AND M2.end_actor_id = M3.begin_actor_id
   ... 
   AND Mn.end_actor_id = 'Joe Celko' 
  AND Mn.begin_actor_id 
          NOT IN ('Kevin Bacon', M2.end_actor_id, M3.end_actor_id, ..);

Example 3: Altering the table.

Previous 1 2 3 4 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Database

Discovering Relationships in Context

Finding the Shortest Path

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Database

Discovering Relationships in Context

Finding the Shortest Path

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Database Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content