# Teaching C++ Badly: How to Misuse Arrays

My last post listed some ways in which I've seen C++ taught badly, and promised to explain these teaching techniques in more detail. Here is the first of those explanations.

I have seen the claim, attributed to Trenchard More, that arrays are the most fundamental of all data structures — a fact that we know because there is probably not a programmer now living who has not at one time or another asked management for arrays. If that is not enough evidence, consider the idea that arrays are an abstraction of the idea of linear addressing, which forms the basis for all computing hardware in common use.

Linear addressing is the idea that a computer's memory consists of a collection of cells with consecutive numbers called *addresses*. By using ordinary integer arithmetic to compute the address of a cell, we can easily go from the notion of "the *n*th element of an array" to the corresponding address.

Because of the connection between the idea of an array and the underlying idea of an address, arrays work similarly in most programming languages that support them. An array has some number of *elements*, each element has an *index* or a *subscript*, and these indices are consecutive integers. Some languages start their indices from zero, others from one, and still others let them start anywhere; but that's about the extent of the difference between languages.

Most languages that support arrays also prevent the size of an array from changing once the array has been created. C and C++ go even further: They require that the size of an array be known during compilation; if it is not, then the program must allocate a *dynamic array* that, although it shares properties with a plain array, differs in some of its details. Even these dynamic arrays, however, cannot change size once they've been created.

Now let's think about how programmers use arrays. Most programs that use arrays do three distinctly different things with these arrays. Sometimes these three things are interleaved, but often the program does them one at a time:

- Put data into the array
- Access the data in the array
- (optional) Take all of the data out of the array

Step 2 includes rearranging the data, perhaps by sorting it; step 3 includes printing the data.

For example, consider a program that computes prime numbers. Each time it tests a number to see whether it is prime, it might use the values already in the array as divisors (step 2). Each time it finds a prime number, it might put that number into the array (step 1). Finally, it might print the numbers it has found (step 3). Here, steps 1 and 2 are interleaved; step 3 might be interleaved or might be separate.

Programs that use arrays according to these steps have to cope with two problems. First, if the language requires its programmers to freeze the size of the array at the time it is created, the programmer may not know how many elements to request. Second, even if the programmer does know how many elements, the fixed size requirement means that for most of the time the program is running, the array will have elements that have no meaningful value.

These restrictions give rise to two common programming practices that, in my opinion, make it much harder than necessary to learn how to program:

- Guessing how many elements the array is apt to need
- Using an auxiliary variable to keep track of how many array elements are in use

For example, let's return to our hypothetical prime-number program. I'm sure you've seen many such programs that start out by doing something like this:

int primes[1000]; // We won't need more than 1000 primes

primes[0] = 2; // The first prime is 2

n = 1; // We have 1 prime so far

Our program guesses how many primes we'll be wanting to compute (1000), and then uses an auxiliary variable (**n**) to keep track of how many primes we've computed. When we come up with a new prime, the program will execute

primes[n++] = prime;

If the programmer is unusually thoughtful, this statement will be preceded by checking whether **n >= 1000**; otherwise the program will just crash when the array overflows.

From a teaching viewpoint, this program is bad for at least three reasons:

- It imposes an arbitrary restriction on its own capabilities — it cannot compute more than a fixed number of primes
- It uses two variables (
**primes**and**n**) where one should do, thereby complicating the code beyond necessity - It wastes resources by allocating array elements whether or not they're needed

The standard **vector** template avoids all three of these problems. Instead of the two examples above, we would write

vector

primes.push_back(2); // The first prime is 2

In the forthcoming C++ standard, we can collapse this example into one statement:

vector<int> primes = {2}; // primes starts out with one element

When we have computed a new prime, we can append it to **primes** by executing

primes.push_back(prime);

This statement increases the size of **primes** by one, by appending a copy of the variable **prime** to **primes**. In effect, it **push**es **prime** onto the back of **primes**. In one stroke, we have eliminated two out of our three problems: Using a vector instead of an array removes the arbitrary restriction on the number of elements; and it uses one variable instead of two.

It doesn't really eliminate the waste of resources, because behind the scenes it allocates memory for more array elements than it needs. However, now it's the implementation, not the programmer, that is wasting memory. Moreover, this waste now provides the opportunity to show programmers how they can choose from among several space/time tradeoffs without having to rewrite their code drastically in order to do so. Instead of teaching beginners that they have to write unnecessarily complicated, wasteful code, we can teach them how to write code that does exactly what they want, and then to come back after it works to figure out how — or even whether — they should optimize it.

In short, programmers tend to use arrays in a way that makes the arrays grow during the program's execution. Built-in arrays in C and C++ don't grow very well, but vectors do. This fact makes vectors more useful for teaching purposes — especially if we are teaching beginners — than arrays are. Moreover, arrays do not force students into writing code that is intrinsically wasteful. Instead, students can get into the useful habit of making the program working correctly first, and only then thinking about how to optimize it.