Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Programming Without Variables

September 12, 2013

Last week, I discussed two different philosophies of programming-language design. One philosophy is that programming languages exist in order to grant convenient access to computer hardware; the other is that a programming language should make it easy to write programs, and that the computer's job is then to execute those programs.

I used mathematical notation as an example. In general, mathematicians use a particular name to denote a single value in a particular context. The common notion in programming that a name denotes a "variable," which might have one value at one time and another value at another time, is generally absent from mathematical notation.

As an example, I posed a problem: Rewrite the following C function so that a name denotes only a single value at any given time:

unsigned collatz(unsigned n)
{          
        assert(n != 0);
        unsigned count = 0;
        while (n != 1) {               
                if (n % 2 == 0)                     
                        n /= 2;               
                else                     
                        n = 3 * n + 1;               
                ++count;
        }   
        return count;    
}

Several readers proposed solutions; the most straightforward of which is the one by reader Nons:

 
unsigned rcollatz(unsigned n)

{
    
      assert(n != 0);
    
      if (n == 1)
        
          return 0;
    
      else if (n % 2 == 0)
        
          return 1 + rcollatz(n / 2);
    
      else
        
          return 1 + rcollatz(3 * n + 1);

}

This is an example of replacing an iterative strategy by a divide-and-conquer strategy. The general idea of the latter is that you solve a simple case of the problem, and then you solve the more complicated cases by finding a way to reduce that more complicated case to one or more simpler cases.

Although the idea of replacing iteration by divide-and-conquer is straightforward to explain, it has a pragmatic disadvantage: Programming languages typically have to store in memory the chain of function calls currently being executed, which means that each trip through this "loop" consumes memory to keep track of those calls. In other words, rcollatz needs an amount of memory proportional to its result in order to run.

There is an alternate, somewhat more complicated, technique to translate a function with a loop into a recursive function that does not change any variables. It takes advantage of the fact that it is usually possible for a compiler to translate a tail recursion into a jump instruction. A tail recursion is a recursive call that is used immediately as the function’s result. So, for example, the statement

 
return 1 + rcollatz(n / 2);

does not involve a tail recursion because after rcollatz returns, its result is used as an operand of +. If, on the other hand, we were to write

 
return rcollatz(n / 2);

that call would be a tail recursion.

Here is how to replace a loop by a function that uses only tail recursion:

  1. Identify every variable with a value that changes during the loop.
  2. Define an auxiliary function with one parameter for each of the variables identified in (1).
  3. Replace the loop with a tail-recursive call that uses the new values of the variables.
  4. Replace the original function with a call to the auxiliary function with the appropriate initial values for the variables.

In the case of collatz, step (1) should identify the variables n and count. Steps (2) and (3) yield the following auxiliary function:

 
     unsigned collatz_aux(unsigned n, unsigned count)
     {
           if (n == 1)
                return count;
           else if (n % 2 == 0)
                return collatz_aux(n / 2, count + 1);
           else
                return collatz_aux(3 * n + 1, count + 1);
     }

Step (4) then gives us the following tail-recursive version of collatz:

 
     unsigned trcollatz(unsigned n)
     {
           assert(n != 0);
           return collatz_aux(n, 0);
     }

What we have learned so far is that, at least for this function, it is possible to rewrite it so that its "variables" never actually vary, and with a little effort and a reasonable compiler, it is possible to do so while maintaining the program's speed.

However, there's a problem, which we can see if we change the original program specification: Instead of returning the number of steps needed to reach 1 from a given input, suppose we want to return the entire sequence of values reached during the Collatz process. We can write that program iteratively as follows:

vector<unsigned> collatz_vec(unsigned n)
{
          
     assert(n != 0);
     vector<unsigned> result;
          
     while (n != 1) {
          
               result.push_back(n);
               if (n % 2 == 0)
                     
                      n /= 2;
               
               else
                     
                    n = 3 * n + 1;
               
       }
       return result;
    
}
 

If we try to write this program recursively using the technique we mentioned earlier, we find that each recursive call passes the entire result array as an argument. As a result, the program's runtime will be quadratic in the number of iterations — hardly a happy state of affairs. In effect, the whole idea of an array is built on being able to change part of it, and if we intend to avoid changing memory once we have put values into it, we need another kind of data structure entirely. Next week, we'll look at what such a data structure might be.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video