Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

Discovering Relationships in Context


Finding the Shortest Path

A path through a graph is a traversal of consecutive nodes along a sequence of edges. Clearly, the node at the end of one edge in the sequence must also be the node at the beginning of the next edge in the sequence. The length of the path is the number of edges that are traversed along the path. A path does not have a cycle in it.

In this particular problem, the nodes are actors and the edges are the "in a movie with," "directed by," "costarred," and other such relationships.

Here, I'm looking for a path from "Kevin Bacon" to some other actor that has a length less than six. Actually, what I would really like is the shortest path (most direct relationship) within the set of all paths between those two actors.

The advantage of SQL as a database language is that it is a declarative, set-oriented language. I tell the SQL engine what I want to get back and it figures out how to do it. An optimizer looks at statistics, indexes, table sizes, and dozens of other things to get an execution plan. The optimizer is usually smarter than human beings.

But suddenly, this is not the approach we want with a graph. When I specify a rule for a path, I have to know the length (n) of the path I am looking for before I can write the query. Then when I run this query, I get all the paths in the set. Looking for this set can be a combinatorial explosion.

What I really wanted was the shortest path between my two actors. This means I have to start with paths of length=1, length=2, and so forth, until I either find a path or have searched all the edges in the graph. Example 2 is a template for a simple, two-edge path starting at Kevin Bacon.

SELECT M1.end_actor_id, 
        M1.relationship_name  
       M1.end_actor_id, 
        M2.relationship_name,  
       M2.end_actor_name 
  FROM MovieCasts AS M1, MovieCasts AS M2
 WHERE M1.begin_actor_name = 'Kevin Bacon'
   AND M1.end_actor_id = M2.begin_actor_id 
   AND M2.end_actor_id = 'Joe Celko';

Example 2: Two-edge path.

To extend this pattern, we alter the table by adding a column for the next relationship and one for the next actor. The important part is to also add a predicate that prevents cycles. That is, the next step in the path cannot already be in the unaltered table.

Notice in Example 3 that you have to redo the query for every path length for every pair of actors. If there is no relation between the two actors, you waste a lot of time and storage space, assuming that the problem doesn't overflow the limits of your database engine. Luckily, there's a tool that handles this kind of problem.

   AND M1.end_actor_id = M2.begin_actor_id;  
   AND M2.end_actor_id = M3.begin_actor_id
   ... 
   AND Mn.end_actor_id = 'Joe Celko' 
  AND Mn.begin_actor_id 
          NOT IN ('Kevin Bacon', M2.end_actor_id, M3.end_actor_id, ..);

Example 3: Altering the table.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.