C/C++

C++ Hash Table Memoization: Simplifying Dynamic Programming

By Mark Nelson, October 22, 2007

C++ hash containers that improve storage of subproblem results when using dynamic programming

The Dynamic Programming Solution

A true bottoms-up approach is probably impractical for this problem.

First, it's a lot of work to traverse a tree one level at a time, starting at the lowest level. Second, there are problems with the tabular subproblem storage scheme usually favored in dynamic programming. It's not entirely clear where in a table we would store the result from GET-MAX-WEIGHT( q ).

An approximation of the bottoms up approach could be achieved by traversing the tree in postorder, visiting all the children of a node first, then visiting the node itself. In this scenario we would visit the nodes from Figure 2 in the following order:

h i d q r j k e b l m f n p g c a

At every non-leaf node, we would have all the information we needed to calculate GET-MAX-WEIGHT( node ) without actually actually recalculating any results -- the subproblem results will have already been calculated and stored.

Of course, traversing the tree in postorder is actually what we're already doing already in the recursive implementation. Seeing this, my solution to the algorithm was to use a C++ hash table to perform memoization of the subproblems. This has a few advantages:

My implementation looks just like the recursive definition of the algorithm. It just has a few additional lines needed to check for stored subproblems.
I don't have to worry about finding a tabular storage method -- I store the results based on a hash of the the node pointer, making lookup and storage nice and simple.
I don't have to worry about how to do bottoms-up traversal of the org chart. My recursive definition visits the nodes in the order I want without any changes.

I'll start by simply presenting my modified version of GetMax( node ), then present the infrastructure that goes with it. Finally, I'll discuss the last step in any dynamic programming problem: using the stored subproblem results to get a solution.

The Memoized GetMax()

The C++ code to solve this problem is completely defined in a class called Memo, which relies on a tree whose nodes are defined by class Node. The two class definitions are shown here:

class Node { 
public : 
  Node( std::istream &input ); 
  const std::string &Name() const; 
  int Weight() const; 
  const Node &Child( size_t i ) const; //return ref to child node 
  size_t Children() const; //return count of child nodes 
}; 
class Memo { 
public : 
 int GetMax( const Node &node ); 
 void MarkIncluded( const Node &node ); 
 void PrintTree( std::ostream &s, const Node &node, bool init=true ); 
private : 
 struct Solution { 
   Solution(); 
   bool mAttending; 
   bool mIncluded; 
   int mMaxWeight; 
 }; 
 std::map<const Node*,Solution> mSolutions; 
};

To run an instance of this problem, I first construct a tree by having the Node constructor read in a definition file. The resulting root node is then passed in to Memo::GetMax(), which works its way through all the subproblems and calculates the maximum total weight for this problem.

The code in GetMax() should appear nearly identical to that in the pseudocode version of GET-MAX-WEIGHT(), with one critical difference: in my implementation of GetMax(), I check to see if a hashed value of GetMax() has been saved upon entry to the routine, and if it has, I short-circuit execution and return that value immediately.

Likewise, at the end of the routine, I store the newly calculated maximum value in the hash table.

The key to this hash table is a pointer to a node -- it could just as easily have been the name of the node, but using the pointer has the advantage of making the program a little more invulnerable to input error -- duplicate node names won't break the algorithm.

The use of a pointer as a key into a hash_map is a technique that can be used to work around some of the issues I discussed earlier in the article. Because the hash containers offer built-in support for pointers of any type, use of a pointer as a key allows you to avoid having to define your own hash function and comparison function, which is nice.

int Memo::GetMax( const Node &node ) 
{ 
  std::map<<const Node*,Solution>::const_iterator ii; 
  ii = mSolutions.find( &node ); 
  if ( ii != mSolutions.end() ) 
    return ii->second.mMaxWeight; 
  int weight_not_attending = 0; 
  for ( size_t i = 0 ; i <node.Children() ; i++ ) 
    weight_not_attending += GetMax( node.Child( i ) ); 
  int weight_attending = node.Weight(); 
  for ( size_t i = 0 ; i <node.Children() ; i++ ) 
    for ( size_t j = 0 ; j <node.Child( i ).Children() ; j++ ) 
      weight_attending += GetMax( node.Child( i ).Child( j ) ); 
  std::cout <<"GetMax(" <<node.Name() <<")\n"; 
  if ( weight_attending>= weight_not_attending ) { 
    mSolutions[ &node ].mAttending = true; 
    return mSolutions[ &node ].mMaxWeight = weight_attending; 
  } else 
    return mSolutions[ &node ].mMaxWeight = weight_not_attending; 
}

This routine is conceptually pretty simple. The first four lines check to see if I've cached a previously seen solution to the problem in the mSolutions hash_map. If I have, the mMaxWeight member of that solution information is returned immediately, avoiding a duplicated subproblem.

If the solution is not already stored off, I have to do the real work. This means evaluating both possible solutions -- the weight if this node attends, and thus excludes all its immediate children, and the weight if this node does not attend, and thus includes the children. Those two values are calculated and stored in local variables weight_attending and weight_not_attending.

Given those two values, its a simple matter to decide which of the two is larger, and return that one. However, there are a couple of issues to deal with before returning.

First, I want to make sure that we store the solved subproblem in the solution table. Both return statements do that by writing the value to the mMaxWeight member of the correct entry in mSolutions.

That would be enough to get the maximum value, but I may also want to create a solution for the problem -- which means not only knowing the maximum weight achieved, but a list of attendees. In order to get that list of attendees, I need to also keep track of whether the maximum value achieved at this node was done so by attending or not attending. I do that by updating a member called mAttending in the Solution object. (Note that the constructor sets mAttending to false upon creation.)

Previous 1 2 3 4 5 6 7 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

C++ Hash Table Memoization: Simplifying Dynamic Programming

The Dynamic Programming Solution

The Memoized GetMax()

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

C++ Hash Table Memoization: Simplifying Dynamic Programming

The Dynamic Programming Solution

The Memoized GetMax()

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content