C/C++

Dynamic Programming

By Mark A. Pumar, May 01, 2003

Dynamic programming has been widely used to solve problems in science, engineering, and finance—and it's become the algorithm of choice for many large optimization problems.

May03: Dynamic Programming

Mark is a principal software engineer at Raytheon ITSS in Pasadena, California. He can be contacted at [email protected].

Dynamic programming lends itself best to problems that can be decomposed into discrete steps or stages. By choosing the best course at each stage, it eliminates otherwise suboptimal or unprofitable avenues to arrive, after a finite number of steps, at the optimal solution. The ability to discard suboptimal solutions at each stage is known as the "principle of optimality." Richard Bellman stated this principle in Dynamic Programming (Princeton University Press, 1957) as follows:

An optimal policy has the property that whatever the initial state and the initial decisions are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

For example, consider the problem of selecting the best route from, say, New York to San Diego; see Figure 1. Suppose the best routes from Chicago, Denver, and Austin to San Diego are known. Each of these routes has an associated cost that reflects the notion of what is meant by best—shortest, quickest, least expensive. Now consider a possible route that goes from New York to Washington D.C., then either to Chicago, Denver, or Austin. The principal of optimality states that no matter what route is taken to Washington D.C., the best route from there to San Diego is the one with the minimum cost.

This route is easily found from examining the cost from Washington D.C. to Chicago, Denver, and Austin, and adding it to the cost of traveling to San Diego from each of these cities. The route with the overall lowest cost is best. Suppose, for the sake of concreteness, this is the route through Denver. Then the routes through Chicago and Austin can be eliminated from further consideration with respect to a route through Washington D.C. Thus, when examining routes from, say, New York to Philadelphia and Raleigh that pass through Washington D.C., it is no longer necessary to consider the routes from Washington D.C. to Chicago and Austin—only the route to Denver is relevant since it has the lowest cost. This eliminates the need to keep recalculating the cost of the routes through Chicago and Austin, therefore reducing the amount of computation required to find the best route.

Algorithmically, dynamic programming works by decomposing the problem into a finite number of stages. Each stage contains the same set of discrete objects under consideration (a list of cities in the route-finding example). The cost of combining each item in the set, with the set of best combinations computed from the previous stage, is calculated. Using the principle of optimality, only the best combinations are saved, and the process repeats until the last stage is reached. The final set of best combinations, or strategies, is then examined and the one with the lowest cost is the optimal solution to the problem.

Mathematically, let L={o₁,...o_N} be a set of N objects and Sm(n) a set containing n of the objects from L where n£N. The set S(n)={S_m(n)}0£m£M denotes the strategies at the n^th stage of the process with M£N. If the o_i...Sm(n) denotes the cost of combining the i^th object from L with the m^th strategy from S(n), then Examples 1(a) and 1(b) are True.

Example 1(a) is the mathematical statement of the principle of optimality and shows how each stage is computed from the previous stage in the process. Example 1(b) gives the initialization of the first stage of the process. Thus, the first set of strategies each contains a single object from the set L. In essence, they are just to the objects in L and S(0)=L. The optimal solution, s, is found by selecting the strategy with the minimum cost from the final set in Example 1(c).

Implementing a dynamic program requires the following steps:

1. Initializing the initial set of strategies with the set of input objects; see Example 1(b).

2. Computing and saving the strategies at each stage according to Example 1(a).

3. Finding the strategy with the minimum cost; see Example 1(c).

4. Extracting the set of objects from the strategies with the minimum cost.

The last step is required because the strategies usually don't maintain a list of objects, but rather contain a single object and pointer to the best pairing strategy from the previous stage. It is, therefore, necessary to trace back through the strategies from each stage to extract the objects in the optimal solution set.

Implementation

Example 1 suggests that a general implementation of a dynamic programming (DP) algorithm is possible if the details of computing the cost between objects and strategies can be abstracted away from the generation and management of the strategies at each stage of the process. Fortunately, C++ is ideally suited for this kind of abstraction with its built-in support of polymorphism and inheritance. (I've compiled and tested all source code presented here under Windows 2000 using Microsoft Visual C++, and under Linux and HP UNIX using the gcc compiler. The complete source code is available electronically; see "Resource Center," page 5.) Both are used to achieve an implementation that works for a large number of practical problems for which DP is the solution method of choice.

Listing One is the header for the DpSolution class that implements Example 1. This is not an abstract class, but rather makes use of the helper class DpObject (Listing Two), which is abstract. The abstract class DpObject encapsulates the calculation of cost between objects, removing this detail from the DP solution class. The DpSolution class interface is small, consisting of a single constructor, destructor, and the solve function as its main components.

The class constructor (Listing Three) handles the initialization of the class and the DP solution in Example 1(b). A list of input objects is copied to a vector and an optional sort is performed. The sort is controlled by the last argument of the constructor.

In some cases, the DP solution is facilitated by ordering the initial objects before beginning the calculation. After the sort, a matrix of strategies is initialized with the sorted input objects. Typedefs are used to define the matrix as a vector of vectors. This shifts most of the memory management from the class to the Standard Template Library. Unfortunately, some memory management is still required. In particular, the DpObject class (and its children) are required to provide a clone function used by the DpSolution class for making copies of the input objects. These copies are stored in the m_stage class data member, which holds the solution strategies.

The class destructor deletes these objects when the class goes out of scope. In addition, the class constructor deletes the temporary copy of the input list used for the optional sort. After initialization, the solve function is called to compute and return the solution (as a list of DpObjects). Listing Four is the solve member function. This function begins by checking that the number of stages is less than the number of objects; if not, a list containing the input objects is returned.

Next, the cost associated with the first-stage strategies is computed using the evalInitialCost member function from the DpObject class. This function allows initialization of the starting strategies with nonzero values, if necessary. Next, the cost associated with each strategy is computed for every stage in the process. The code:

double c = m_stage[i][j]->

evalCost(m_stage[i-1][k]) +

m_stage[i-1][k]->cost();

if(c<cmin) {cmin=c; partner=k;}

in the innermost For loop is the implementation of Example 1(a). The index to the strategy of the previous stage that yields the minimum overall cost is kept in the temporary variable partner and later stored as part of the strategy.

Once all the strategies have been computed, a final cost is added to the last-stage strategies using the DpObject class member function evalFinalCost. This accounts for any end-point or boundary conditions associated with the objects. If no such conditions exist, this function returns zero for the final cost. This computation is leveraged to find the lowest cost strategy that contains the optimal solution to the problem.

Finally, the trace-back calculation is performed to extract the objects from the strategies that comprise the optimal solution. Again, strategy contains a pointer (index) to a partner strategy from the previous stage. By starting with the lowest cost strategy at the final stage, these indices are traced back to the first stage and their associated objects are extracted. These extracted objects constitute the optimal solution and are stored in the list ans, which is returned at the end of the solve function.

The DpSolution class contains a few additional member functions for returning the number of stages and strategies (stages), and the individual strategies at each stage (stage), and for printing. Although these functions are not necessary, they are useful for examining and extracting substrategies and their solutions, if desired.

Listings Three and Four contain the general implementation of the DP algorithm. This code is devoid of the particulars of computing cost or the nature of the objects that constitute the problem. These details are encapsulated in the abstract class DpObject. The DpObject class defines the interface between the problem and method of solution, and contains three data members: m_id, a string for identifying the object; m_cost, the cost associated with the object and its partners; and m_partner, an index to a partnering object. The member functions: string id(void), int partner(void), double cost(void), void setCost(double cost), and void setPartner(int index, int stage) provide read/write access to these data members and are used by DpSolution to store/retrieve this information during the calculation.

The member functions double evalCost(const DpObject *dp), double evalInitialCost(void), and double evalFinalCost(void) return the costs associated with the objects during the calculation. These are all pure virtual functions and must be defined in the inheriting class. The setPartner function is overloaded to provide access to the chosen partner object during the storage of its index in case additional information about the partner is required during this phase of the process.

The operator functions operator> and operator< are also defined for use in sorting the initial list of objects. These functions are also pure virtual functions that must be defined in the inheriting class. Finally, operator<< is defined for printing the class. Also contained in the DpObject header is the inline function DpObjectPtrLessThan, which is used in the sort function in the DpSolution class constructor.

DpSolution and DpObject are the basis for implementing the DP algorithm as a solution to a discrete optimization problem. The solution to an actual problem is computed by implementing a concrete class that inherits from DpObject, contains the details of the particular problem, and provides the required cost evaluation functions. This class is then used to construct an instance of the DpSolution class, and its solve member function is called to compute the desired solution. The work is therefore reduced to defining and implementing a class that contains the particulars of the problem of interest.

Application

To demonstrate DpSolution and DpObject, I present an example of tracking an astronomical object—that is, a planet. The inputs to the problem are viewing intervals when the planet is physically observable from one or more fixed locations on the Earth. The objective is to provide a fixed number of viewing intervals (called "tracking passes"), which are uniformly distributed over a desired observation period. The uniformity of the intervals is found by minimizing Example 2, where T^band T^eare the beginning/ending of the desired observation period, respectively, and t_bi and t_ei are the beginning/ending of the i^th tracking pass. Figure 2 shows viewing intervals for Mars when it is within 60 million kilometers of Earth. Observers are located at three different sites that span the globe.

Assume that during the approximately 41 days when Mars is within 60 million kilometers of the Earth, you want to make 14 uniformly spaced observations. Determining the desired viewing intervals is done by creating a TrackingPass class (available electronically) that inherits from DpObject. The TrackingPass class implements all the virtual functions of the DpObject parent class. In particular, the member functions evalInitialCost, evalCost, and evalFinalCost implement Example 2 for computing the cost associated with each interval.

The TrackingPass class contains four data members particular to this problem—the beginning/ending times of the view interval (m_begin and m_end), and the beginning/ending times of the observation period (m_refBeg and m_refEnd, respectively). These intervals define the t_bi, t_ei, T_b, and T_e in Example 2. The file ppp.cpp (also available electronically) is the calling program that combines all the classes, computes, and prints the solution. The data is contained in the header file TrackData.hpp for compactness; a more general implementation would probably read the viewing intervals from an external file.

The program begins by opening an output file for printing the results and declaring two lists of DpObject pointers, one to hold the input data and the other to hold the answers. Next, the input data is placed into TrackingPass objects and stored in the list declared earlier. The data in the input list is used to instantiate a DpSolution object along with the desired number of observations or passes. The data is then printed and deleted from memory. The DpSolution object's solve member function is called to compute the solution, and the contents are printed. This solve function returns a list of DpObject pointers in the previously declared list, prints it, and then deletes it. Figure 3 shows the results for the viewing intervals of Figure 2 and the cost function in Example 2.

The intervals selected as part of the solution are in green and red, and intervals excluded are in white. Since the spacing between intervals for Observer #2 was the smallest of the three observers, the optimal solution (not surprisingly) consists almost exclusively of intervals from this site. Also, the chosen intervals are uniformly spaced with every third interval selected as part of the solution. The DP algorithm achieved the desired goals implicitly through the cost function (Example 2). The algorithm itself has no explicit connection to the desired optimization goals. Changing the cost function is therefore equivalent to changing the desired goals. This is illustrated by the results in Figure 4, where the cost has been modified to reflect a higher cost for using Observer #2, relative to the other two sites.

In this case, a fixed cost is added to Example 2 every time Observer #2 is selected. This cost reflects the additional expense of using this site. Again, intervals not selected as part of the optimal solution are in white, while selected intervals are red, blue, and green. In this case, Observer #1 is chosen to replace Observer #2 over approximately half of the observation period. This is not surprising since the time between viewing intervals for Observer #1 is less than for Observer #3. The choice of Observer #3 at the end is due to its close proximity to the end of the observation period. Different results can be achieved by modifying Example 2 to reflect whatever goals are desired with respect to viewing Mars during its period of close approach to the Earth. Figures 3 and 4 illustrate some of the possibilities.

Conclusion

Dynamic programming has been widely applied to solve a variety of problems in science, engineering, and finance (see, for example, Dynamic Programming and Modern Control, by Richard Bellman and R. Kalaba, Academic Press, 1966). Like all optimization techniques, DP suffers from the so-called curse of dimensionality—the number of computations increases geometrically with the size of the problem. The traveling salesman problem is perhaps the most famous example of this problem. As the number of cities visited increases, the size of the possible solution space increases astronomically, and the problem quickly becomes practically unsolvable even on the fastest computers. Although DP significantly reduces the size of the solution space, it is not immune to this effect, and large problems can easily consume exorbitant amounts of memory, CPU cycles, and time.

However, unlike other optimization techniques, DP does guarantee that the resulting solution is optimal. This should be contrasted with techniques such as branch and bound, which merely guarantee good, but not necessarily optimal, solutions. As such, DP is the algorithm of choice for many large optimization problems.

DDJ

Listing One

// DpSolution.hpp. This class encapsulates the dynamic programming solution. 
// It uses the abstract class DpObject which contains the interface to the 
// problem and uses the functions in this interface to calculate the solution 
// to the problem. The class contains the following public functions:

#ifndef _DP_SOLUTION
#define _DP_SOLUTION

 #include <list>
#include <vector>
#include <iostream>

 #include "DpObject.hpp"

class DpSolution
{
   public:
      DpSolution(const list<DpObject *> &dp, int N, bool sortList = true);
     ~DpSolution(void);
      list<DpObject *> solve(void);
      int stages(void) const;
      int stages(int stageNum) const;
      DpObject * stage(int stageNum,int objectNum) const;
      friend ostream& operator<<( ostream& os, const DpSolution& dps );
   private:
      typedef vector<DpObject *> Strategy;
      typedef vector<Strategy> Stage;
      Stage m_stage;  // solution strategies
};
#endif

Back to Article

Listing Two

// DpObject.hpp. Abstract class definition for dynamic solution object. This 
// class defines the interface between the problem and the DpSolution class 
// which is used to determine the solution. Inheriting class are required to 
// define the following member functions:

#ifndef _DPOBJECT_
#define _DPOBJECT_

#include <string>

using namespace std;
class DpObject
{
   public:
     // constructors/destructor
     DpObject(void);
     DpObject(const DpObject &tp);
     virtual ~DpObject(void);
     // member functions
     string id(void) const;
     int partner(void) const;
     double cost(void) const;
     DpObject *clone(void) const;
     // cirtual functions
     virtual void setCost(double cost);
     virtual void setPartner(int index, int stage);
     // Pure virtual functions
     virtual bool operator>(const DpObject & dp) const = 0;
     virtual bool operator<(const DpObject & dp) const = 0;
     virtual void setPartner(int index, int stage, const DpObject *dp) = 0;
     virtual double evalCost(const DpObject *dp) const = 0;
     virtual double evalInitialCost(void) const = 0;
     virtual double evalFinalCost(void) const = 0;
     // friend functions
     friend ostream& operator<<( ostream& os, const DpObject& dp );
   protected:

      string m_id;          // object identifier
      int    m_partner;     // object partner
      double m_cost;        // object cost
   private:
      virtual DpObject *cloneDpObject(void) const = 0;
      virtual void print(ostream& os) const = 0;
};
// comparison function used for sorting
inline bool DpObjectPtrLessThan(DpObject *dp1, DpObject *dp2) 
{
   return(*dp1 < *dp2);
}
#endif

Back to Article

Listing Three

// Constructor
// Inputs:
// const list<DpObject *> &dp - object containing the problem definition
// int                    N   - number of objects desired in solution
// ABSTRACT: Initialize solution class for finding the problem solution. 

DpSolution::DpSolution(const list<DpObject *> &dp, int N, bool sortList) 
{
   // copy and sort input objects
   Strategy p;
   for(list<DpObject *>::const_iterator iter = dp.begin(); 
                                               iter != dp.end(); ++iter)
   {
      p.push_back((*iter)->clone());
   }
   if(sortList) sort(p.begin(),p.end(),DpObjectPtrLessThan);
   // create solution stages
   m_stage.resize(N);
   for(int i = 0; i < m_stage.size(); ++i)
   {
     m_stage[i].resize(dp.size());
     // store tracking passes
     int j = 0;
     for(Strategy::iterator iter = p.begin(); 
                           (iter != p.end()) && (j < dp.size()); ++iter)
     {
        m_stage[i][j] = (*iter)->clone();
        ++j;
     }
   }
   // delete sorted list
   for(Strategy::iterator piter = p.begin(); piter != p.end(); ++piter)
   {
      delete *piter;
   }
}

Back to Article

Listing Four

// Function: solve
// Inputs: None

// Returns: list<DpObject *> - list of objects in solution
// ABSTRACT: Calculate and return list of object that from optimal solution 

list<DpObject *> DpSolution::solve(void) 
{  // check that solution is feasable
   if(m_stage[0].size() <= m_stage.size())
   {  // no solution, return inputs
      list<DpObject *> ans;
      for(int j = 0; j < m_stage[0].size(); ++j)
      { ans.push_front(m_stage[0][j]->clone());}
      return(ans);
   }
   // compute stage 1 (initial) strategy/cost
   for(int j = 0; j < m_stage[0].size(); ++j)
   {
      m_stage[0][j]->setCost(m_stage[0][j]->evalInitialCost());
   }
   // compute stage 2 to N-1 strategies/costs
   for(int i = 1; i < m_stage.size(); ++i)
   {
     for(int j = 1; j < m_stage[i].size(); ++j)
     {
        double cmin = m_stage[i][j]->evalCost(m_stage[i-1][0]) +
                                              m_stage[i-1][0]->cost();
        int partner = 0;
        for(int k = 1; k < j; ++k)
        {
           double c = m_stage[i][j]->evalCost(m_stage[i-1][k])  + 
                                              m_stage[i-1][k]->cost();
           if(c < cmin) { cmin = c; partner = k; }
        }
        m_stage[i][j]->setCost(cmin);
        m_stage[i][j]->setPartner(partner, i-1, m_stage[i-1][partner]);
     }
   }
   // compute final stage N strategy/cost
   double cmin = m_stage[m_stage.size()-1][0]->cost() +
                         m_stage[m_stage.size()-1][0]->evalFinalCost();
   int    imin = 0;
   for(int k = 1; k < m_stage[m_stage.size()-1].size(); ++k)
   {
     m_stage[m_stage.size()-1][k]->
                       setCost(m_stage[m_stage.size()-1][k]->cost() +
                       m_stage[m_stage.size()-1][k]->evalFinalCost());
     if(m_stage[m_stage.size()-1][k]->cost() < cmin)
     {
        imin = k;
        cmin = m_stage[m_stage.size()-1][k]->cost();
     }
   }
   // extract solution
   list<DpObject *> ans;
   int is = m_stage.size()-1;
   while((imin >= 0) && (is >= 0))
   {

      DpObject *tp = m_stage[0][imin]->clone();
      tp->setCost(m_stage[is][imin]->cost());
      tp->setPartner(m_stage[is][imin]->partner(),imin);
      ans.push_front(tp);
      imin = m_stage[is][imin]->partner();
      --is;
   }
   return(ans);
}

Back to Article

1 2 3 4 5 6 7 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

Dynamic Programming

Implementation

Application

Conclusion

Listing One

Listing Two

Listing Three

Listing Four

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

Dynamic Programming

Implementation

Application

Conclusion

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content