Parallel

Mar02: Algorithm Alley

By Timothy Rolfe, March 01, 2002

Graph Traversals

Tim is an associate professor of computer science at Eastern Washington University. He can be contacted at [email protected].

Suppose you need to determine whether you can get from point A to point B. Or what if you have a network problem involving a message being sent using a copy-and-forward protocol. For that matter, assume you need to connect a network of computers using the least amount of cabling. What these problems have in common is that they all involve graphs.

The abstract definition of a graph is a set of nodes (also called "vertices") and a set of edges (arcs) connecting them. The cities, towns, and crossroads on a map, for instance, can be considered nodes, while the roads connecting them can be the set of edges. Traveling the edges of the graph to visit the accessible nodes involves traversal. If the traversal visits each node exactly once, the edges used constitute a spanning tree — the subgraph that connects all the nodes with the smallest number of edges. It spans the graph because it visits all of the nodes, and it is a tree because the starting point has no inbound edge while every other node has exactly one inbound edge.

There are several traversal algorithms that generate spanning trees with different characteristics (depth-first traversal, breadth-first traversal, and priority-first traversal). These algorithms generate different types of spanning trees (for instance, depending on the priority chosen for the priority-first traversal, you get a minimum spanning tree or a single-source/shortest-path spanning tree). Besides traversals generating spanning trees, other graph traversals involve checking all possible paths. With appropriate restrictions on the allowed paths, you can solve the famous Traveling Salesman Problem.

Data Representation

The first order of business is to represent the graph inside the computer. The nodes themselves are easily collected into an array, giving easy access to them for operations that involve all nodes.

In Listing One, I represent the nodes as an array of node objects, where each object contains information about the node (the city/town/crossroad name), and additional attributes used in the graph algorithms. Specifically, for most algorithms there is a flag keeping track of whether the node has been visited. In addition, I need the distance from the start for the single-source/shortest-paths traversal.

The edges are represented based on the source and destination nodes — even for undirected graphs (where the edges allow travel in both directions), the representation of the edges typically has an implicit direction. You can use a set of linked lists (one for each source node) giving the immediately accessible destination nodes. These are called "adjacency lists." On the other hand, you can use a two-dimensional array (called an "adjacency matrix"), where the row subscript is tied to the source node, and column subscript is tied to the destination node.

I use the adjacency matrix approach. The edges are in a 2D array of pointers to edge objects. The subscript for the node in the node array also represents that node in the adjacency matrix — the [j][k] matrix cell represents the edge from Node[j] to Node[k]. A NULL pointer indicates that there is no connecting edge, while a single-edge object for the j-to-k edge and the k-to-j edge is used. The edge object holds the length of the edge and could hold other information. For example, if you are using the graph to represent a map, the edge might hold both the road distance and the highway designation.

Traversal algorithms that generate spanning trees require a way of telling whether a node has been visited. The node object contains a flag attribute to denote unvisited or visited status, along with methods to set and examine that flag. Consequently, the traversal algorithms need an initial step of marking all nodes as unvisited. The traversal algorithms then mark nodes visited as they progress. Even more, if a zero value represents unvisited, you can use the order of visiting the nodes (1st, 2nd, and so on) as the flag. (Although I don't cover them here, some graph algorithms make use of the depth-first-traversal index — exactly these ordinal numbers.)

To explicitly control the work scheduling, use your own data structure to hold the edges pending processing. The type of data structure will determine the kind of traversal you implement.

These implementations are built to capture the sequence of edges used in the traversal. Consequently, they generate entries in an array of ordered-pair objects (since an edge is an ordered pair of node designations). The subscript for that array holds the subscript for the next pair to be added to the list — and, thanks to starting with subscript zero, it is also the number of such pairs currently entered.

Depth-First Traversal

The easiest traversal to implement lets the computer's call stack control the processing, as in Listing Two, giving a depth-first traversal. It generates a spanning tree with very little branching; see Figure 1. The function is called with the subscript for a node that is currently unvisited as the destination. You mark that node as visited, and add to the edge list the incoming edge. The initial call uses a pseudoedge with a flag value for the source to start the ball rolling. You then examine the adjacency matrix to find outbound edges. For each outbound edge, if the destination node is currently unvisited, make the recursive call to visit that node. This means that when you return from the recursion, you may well find that upcoming nodes that had been unvisited before the recursion now have been visited.

If you want to avoid recursion, you can use your own stack (last-in/first-out) in making your traversal. To simplify the processing (at the expense of requiring more space), I do not mark nodes as pending when their incoming edges are entered into the work list. Instead, when an edge is removed from the list, the first check is whether its destination node is still unvisited.

Breadth-First Traversal

If, instead, you use the queue discipline (first-in/first-out or FIFO) for your work list, you generate the spanning tree with the most branching so that every node is at the shortest possible distance (in number of edges) from the starting point; see Figure 2.

In Listing Three, the work queue is initialized with a pseudoedge with its destination given by the node specified in the parameter. The work queue is actually an output-restricted double-ended queue, or deque, using a linked list with a single removal point, but two insertion points: the front for stack behavior and back for queue behavior. There are even circumstances (though not in this article) where you may want both behaviors at the same time. Listing Four is the declaration of the deque as used in this article. Objects of the class CPair hold ordered pairs of numbers (integers).

The processing loop is driven by the removals from the queue; it continues as long as the removals succeed. If the edge removed ends in a node as yet unvisited, the node processing is similar to that in the recursive algorithm, except that the recursion is replaced by making entries into the work queue for edges from the current node to other nodes that are as yet unvisited. While you do get multiple queue entries for the same destination node, the duplicates are discarded by the flag check at the top of the while loop.

Minimum-Spanning Tree

By changing the work data structure to a priority queue, you can now generate a spanning tree where the total of the edge lengths is smallest. For this kind of traversal, the priority queue is built to give fast access to the shortest available edge as you build the spanning tree; see Figure 3.

The implementation of Prim's algorithm to determine a minimum-spanning tree is identical to the breadth-first traversal except for the data structure used for the work list. Rather than being entered into a FIFO queue, the edges are entered into a priority queue along with their lengths. The priority queue is implemented as a MinHeap. The heap is the classical data structure to use for a priority queue, so that entry of arbitrary values and removal of the smallest (for a MinHeap) value scale as the logarithm of the heap size. As used in this article, the heap cells hold both an ordered pair of integers (the edge) and an integer (the length) that is used as the key value determining the order of entries. The implementation code (available electronically; see "Resource Center," page 5) uses dynamic memory to rescale the heap size as required, hence the default Size=0. Listing Five (also available electronically) presents declarations for the MinHeap used here.

The length is the key used by the priority queue to determine order. Because the procedure Prim is so close to the procedure breadth1st shown in Listing Three, I have not included its listing.

Single-Source/Shortest-Paths Spanning Tree

You can make a small change to what determines priority in the priority queue by keeping track of the distance from the start of the traversal and taking the edge that gives the shortest path from the start in building the spanning tree; see Figure 4. In this case, you end up with the spanning tree that gives the shortest paths from the starting point to all other nodes.

Listing Six (also available electronically) shows how minor additions to Prim's algorithm for a minimum spanning tree transform it into Dijkstra's algorithm for the single-source/shortest-paths spanning tree. Here, you need to retain the current distance of every node from the starting node. The preliminary procedure to clear the traversal flags also sets the distance attribute in all nodes to the largest available number. Now the key for the priority queue becomes the distance from the starting node to the destination node of the edge, not just the edge length itself as in Prim's algorithm. As you make entries into the priority queue, you get this simply by adding together the distance to the node you are processing and the length of the edge from that node to the adjacent node you are thinking of entering into the priority queue.

And, just as there are numerous ways to skin a cat, there may be more than one spanning tree that meets the description of a minimum spanning tree or a single-source/shortest-paths spanning tree.

All-Paths Algorithm (Traveling Salesman Problem)

The algorithms up to this point examined the graph edges just once. Because the edges are in an adjacency matrix, the traversal time required depends on the square of the number of nodes. Other well-known graph traversal algorithms include those that consider all possible paths. Listing Seven (available electronically) shows such an algorithm with no restrictions on the paths other than forbidding traveling the same edge twice. If you want to consider all possible paths in the graph, the time required goes up exponentially. One of the most famous problems solved by an all-path graph traversal is the Traveling Salesman problem (Figure 5).

Traveling Salesman Problem with No Paths Revisited. In this traversal, severe restrictions are placed on the traversal circuit: If you allow absolutely no branching, require that all nodes be visited, and add one extra edge from the last node visited back to the start, you have what is called a "Hamiltonian circuit." If you find the shortest Hamiltonian circuit, you have solved the Traveling Salesman Problem — what is the shortest tour that a traveling salesman can take to visit all clients and return home? (Computing theoreticians sometimes phrase the Traveling Salesman Problem as finding a Hamiltonian circuit with total length less than some constant, but traveling salesmen themselves will almost certainly be interested in the shortest route.)
Traveling Salesman with Node Revisiting. There are, however, graphs for which there are no Hamiltonian circuits: To visit all nodes, you may have to pass through some node(s) more than once. Even here, you can relax the restriction on node visits and solve the Traveling Salesman Problem when that is stated as the shortest circuit to visit all nodes — it just becomes a messier problem. Listing Eight (also available electronically) shows the changes to the allPaths procedure (Listing Seven) to solve this more general version of the Traveling Salesman Problem — as shown in Figure 5, since Spokane is visited more than once.

To find all possible paths, you have to mark edges as in-use rather than marking nodes as visited. Assuming that all edge lengths are positive, you can do that simply by turning the edge length negative while that edge is part of a trial path. Again, the easiest implementation is a recursive one; see Listing Seven.

While the path end remains unchanged, the current node varies as you build trial paths through the graph. From each current node, you use the as-yet-unused outbound edges from it to generate trial paths, marking the edge as in-use before the recursion and marking it as available when you return from the recursion.

When you examine all paths, you may pass through a node more than once, provided you use different edges. You may be interested only in the simple paths, those that visit the nodes only once. It is straightforward to supplement the edge marking with the node marking that was used in the spanning tree algorithms. Finally, if you accept simple paths that start and end in the same node with as many edges as there are nodes, you have found the Hamiltonian circuits — the smallest circuit gives the solution to the Traveling Salesman Problem.

The graph, though, may not have any Hamiltonian circuits. To solve the Traveling Salesman Problem here, you just need to change the node marking and the way that you tell when all nodes have been visited — the number of edges in the path is no longer a reliable criterion; see Listing Eight.

An unvisited node still has a zero flag value, but the flag gets incremented when you pass through it and is decremented after you have examined that trial path; essentially, the flag becomes the number of paths through the node. You have a potential solution whether all of the nodes have flag values greater than zero.

Some Solutions

If you have something that can be represented as a large graph, and you simply want to determine whether you can get from point A to point B, you can simply perform a traversal — perhaps the recursive depth-first one since it is the easiest — starting from point A. When it is finished, you see if point B did indeed get visited.

As for the network problem, where a message is being sent using a copy-and-forward protocol, the breadth-first traversal generates the spanning tree that gives the smallest number of nodes you need to send the message through as you send the message from point A to point B.

And, if you are physically connecting a network of computers, you want to find the minimum spanning tree — the arrangement that connects all of the computers and uses the least amount of cabling.

References

Graphs are discussed in most introductory texts on the topic of data structures. I have based the spanning tree algorithms on their treatment in Robert Sedgewick's Algorithms in C++. As expressly noted in the listing of the allPaths procedure, that procedure is based on code found in James L. Antonakos and Kenneth C. Mansfield's Practical Data Structures Using C/C++, though the procedures developed from it are of my own devising.

Online Resources

There is a ZIP file available electronically from the Resource Center (see page 5). When expanded, it generates the source code for the class CGraph (with the spanning tree traversals) and the derived class CGraphAll (with the traversals based on the allPaths procedure), as well as the necessary utility classes and two driving mains to exercise the traversal algorithms. In addition, there are specimen data files for input to those programs, one of them involving towns and crossroads in my area, the inland Pacific Northwest.

DDJ

Listing One

class CGraph
  {
     ...
     protected:
     ...
        CEdge**  *m_Road;      // Dynamic 2-d array of pointers
        CVertex  *m_City;      // Dynamic array of vertices
        CPair    *m_Sequence;  // Sequence of edges
        int       m_Size;      // Number of cities
        int       m_NRoads;    // Number of roads
        int       m_NextEdge;  // Used in filling m_Sequence
  };

Back to Article

Listing Two

// Recursive Depth-First Traversal
void CGraph::depthRecur ( int Src, int Dst )
{  CPair Edge(Src, Dst);
   m_Sequence[m_NextEdge++] = Edge;
// Save position in the list
   m_City[Dst].setFlag(m_NextEdge);
   Src = Dst;        // Check outbound edges
   for ( Dst = 0; Dst < m_Size; Dst++ )
      if ( m_Road[Src][Dst] != NULL )
      {  int Flag = m_City[Dst].getFlag();
         if (Flag == Unvisited )
            depthRecur (Src, Dst);
      }
}

Back to Article

Listing Three

// Breadth-First Traversal
void CGraph::breadth1st ( int First )
{  CPair Edge;
   Deque Work;
   Edge.set(-1, First); // Pseudo-edge for entry point
   Work.enqueue(Edge);
// remove returns "true" if the operation succeeds
   while ( Work.remove(Edge) )
   {  int Src = Edge.get1(),
          Dst = Edge.get2();
      if ( m_City[Dst].getFlag() != Unvisited )
         continue;      // Already done.  Get next one
      m_Sequence[m_NextEdge++] = Edge;
//    Save position in the list
      m_City[Dst].setFlag(m_NextEdge);
      Src = Dst;        // Check outbound edges
      for ( Dst = 0; Dst < m_Size; Dst++ )
         if ( m_Road[Src][Dst] != NULL )
         {  int Flag = m_City[Dst].getFlag();
            Edge.set(Src, Dst);
            if ( Flag == Unvisited )
            {  Work.enqueue(Edge); }
         }
   }
}

Back to Article

Listing Four

class Deque   // Double-ended queue (stack AND queue behaviors)
{  public:
      Deque (void): m_Front(NULL), m_Back(NULL) { }
     ~Deque (void);     // Return dynamic memory allocations
      void push(const CPair&);     // Stack discipline
      void enqueue(const CPair&);  // Queue discipline
      bool remove (CPair&);        // Single removal point
      bool empty(void) { return m_Front == NULL; }
   private:
      struct DQnode
      {  CPair   m_Edge;
         DQnode *m_Next;
         DQnode(const CPair &V, DQnode *Nxt = NULL)
         :  m_Edge(V), m_Next(Nxt)  {  }
      };
      DQnode *m_Front, *m_Back;
};

Back to Article

1 2 3 4 5 6 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Parallel

Mar02: Algorithm Alley

Data Representation

Depth-First Traversal

Breadth-First Traversal

Minimum-Spanning Tree

Single-Source/Shortest-Paths Spanning Tree

All-Paths Algorithm (Traveling Salesman Problem)

Some Solutions

References

Online Resources

Listing One

Listing Two

Listing Three

Listing Four

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Parallel Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Parallel

Mar02: Algorithm Alley

Data Representation

Depth-First Traversal

Breadth-First Traversal

Minimum-Spanning Tree

Single-Source/Shortest-Paths Spanning Tree

All-Paths Algorithm (Traveling Salesman Problem)

Some Solutions

References

Online Resources

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Parallel Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content