Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Theory Versus Practice: The Great Divide in Programming Languages

September 06, 2013

Last week, I talked about four major, early programming languages: Cobol, Fortran, Algol, and Lisp, and said that Lisp differed from the other three in a fundamental way. That difference is really comes from a philosophical difference about the relationship between programs and computers.

Here's the key question: Do programs exist in order to tell computers what to do, or do computers exist in order to execute programs? In theory, this is a silly question, because the two "alternatives" amount to the same thing, and they are both obviously true. In practice, however, preferring one or the other has a profound effect on the design of programming languages and, to a lesser extent, of computers.

For example, if programs exist in order to command computers, it follows that programming languages should expose those aspects of computers that their users might wish to command. This viewpoint might well lead to the conclusion that assembly languages are the most important kinds of programming languages, as they offer the most direct way of telling computers what to do that is consistent with humans' being able to write programs in those languages.

In contrast, if computers exist in order to execute programs, then it is less important to design programming languages to fit particular computers well than it is to design computers to fit the best programming languages. Of course, this strategy leads to another hard question: What is the best programming language? Moreover, it offers no guidance as to how to figure out which language is the best.

Designing a programming language in order to command a computer leads to the conclusion that the language should look like the computer. If, on the other hand, the programming language comes first, how does one decide how the language should work? If it's not based on an existing computer, and there are no earlier programming languages to use as a model, where does one start?

One logical starting point is mathematics. Mathematical notations have been around for millennia, and mathematicians continuously refine the notation they use in order to make that notation more effective at expressing their abstractions. Accordingly, it makes sense to use some form of mathematical notation as the basis for a programming language.

Every mathematical notation that I have ever seen uses names to refer to concepts that, once defined, do not change within a single context. Such a concept might refer to other values to be defined later, or tentatively defined, but the concept itself remains the same. For example:

Let z be the length of the hypotenuse of a right triangle with legs of lengths x and y, respectively.

This sentence defines z in terms of x and y. It leaves x and y unspecified, but once they have been defined, z is also defined. As long as we are talking about this particular z, it will never be used to mean anything else. So, for example, although we might write something in Fortran that looks similar:

 
               Z = SQRT(X**2 + Y**2)
 

there is a profound difference between the Fortran statement and the corresponding mathematical statement. In Fortran, there is nothing to stop us from writing

 
               Z = Z * 2.0

after which the variable Z has a different value from the one it had before. This kind of statement is a natural way to expose the fact that a computer has memory, and the contents of that memory can change from time to time.

In contrast, a mathematician would never write

Let z refer to its previous value multiplied by 2.

Instead, a mathematician might write something like

Let z' = 2z

Here, the apostrophe (pronounced prime) indicates that z' is related to z in some way; but aside from this relationship, z and z' are two different names. Accordingly, there is never any confusion about which value is associated with z at which time; one always writes z or z' to make the choice explicit.

How do mathematicians deal with problems that involve iteration? For example, consider the following C++ function:

unsigned collatz(unsigned n)
{
           
     assert(n != 0);
           
     unsigned count = 0;
           
     while (n != 1) {
                
           if (n % 2 == 0)
                      
                 n /= 2;
                
           else
                      
                 n = 3 * n + 1;
                
          ++count;
           
     }
           
     return count;
     
}

This function takes a strictly positive integer and repeats the following:

  • If the integer is even, divide it by 2.
  • Otherwise, multiply it by 3 and add 1.

If this process eventually yields 1, the function returns the number of repetitions required to do so; otherwise, the function never terminates. As the function's name suggests, the Collatz conjecture is that this function always terminates for any value of n. The conjecture has been shown by experiment to be true for all integers up to 5*260, but whether it is always true is still unknown.

Obviously, this C++ function uses the names n and count to refer to quantities that vary during the program's execution. In effect, each of these names refers to different things at different times. This programming style is so natural to programmers who have started by learning programming languages that are based on how computer hardware works, such as C or C++ (or Fortran, Algol, or Cobol), that it may be hard even to imagine how one would express such a computation without using a single name to refer to multiple quantities.

Nevertheless, basing a programming language on mathematics yields a workable, although completely different, way of expressing such computations. I'll continue exploring that way next week. Between now and then, I invite you to try rewriting this function in a way that avoids ever changing the value of a "variable" once it has been initialized.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video