Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Down The Rabbit Hole

December 26, 2012

Last week, I described how I almost traced a bug to the wrong program. Having come this far, I would like to describe the actual source of the bug — such a strange source that I would never have suspected it when I began looking.

More Insights

White Papers

More >>

Reports

More >>

Webcasts

More >>

After several false starts, I had found that an early pass in a multipass compiler was sometimes producing output in which widely spaced, apparently random characters were replaced by '0' characters (i.e., not characters with all bits zero, but rather the representation of the digit 0). The question remained unanswered as to why this might be happening.

As before, the first thing I did was to look at the circumstances. I found that this early compiler pass was being run as follows:

  • Run the preprocessor on the program's input and put the results in a temporary file.
  • Create a second temporary file, and copy data into that file, which represents the compiler's output from the standard library. This stop avoids having to recompile the standard-library declarations every time the compiler is run.
  • Run the first compiler pass with its input coming from the temporary file that contains the preprocessor's output, and its output being appended to the second temporary file.

In UNIX shell terms, then, this compiler pass was being run as follows:

 
     preprocessor <input >temp1
     cp library temp2
     phase1 <temp1 >>temp2
 

Here we are following the usual convention that < precedes a file used as a program's input, > precedes a file used as a program's output, and >> precedes a file to which output is being appended. The cp program works with file names as its arguments, so it doesn't need < or >.

As usual, an important early step in debugging any program is to recreate the problem with as little extra context as possible. Accordingly, I started by saving the preprocessor output in a file that I will call testprog. I then reproduced the rest of the process:

 
     cp library temp2
     phase1 <testprog >>temp2

Sure enough, the output in temp2 would occasionally include spurious '0' characters.

The next step was to note that the compiler did not actually look at the library file, so I did not need to use it. The resulting simplification was

 
     phase1 <testprog >temp2

To my astonishment, this simplified version worked every time, even though phase1 did not even have read permission on its output. Could it be that the file append operation was broken? To find out, I tried appending to an empty file:

 
     cp nullfile temp2
     phase1 <testprog >>temp2
 

This version also worked perfectly every time. Perhaps there was something about a nonempty file that caused a failure.

To find out, I created a file that contained "Hello, world!\n" and used that file as input:

 
     cp hello temp2
     phase1 <testprog >>temp2

Again, this version worked perfectly.

At this point, I was really confused. How could a program that is not capable of reading its output file possibly behave in a way that depended on the prior contents of its output file? To prove that the compiler was not reading its output file, I tried breaking things up this way:

     cp library temp2
     phase1 <testprog >temp3
     cat temp3 >>temp2

The cat program stands for catenate; it copies each of its input files to its output file. In this case it has only one input file, so it effectively appends the contents of temp3 to temp2. Running phase1 in this way guaranteed that it would not have access to the contents of temp2.

To my utter astonishment, this revision of the program failed. Not every time — but it never failed every time. In fact, I could execute

 
     phase1 <testprog >temp3

and then repeatedly execute

 
     cp library temp2
     cat temp3 >>temp2

and I would still see occasional failures.

In other words, I had just proved that the compiler bug I was hunting was not in the compiler at all, but rather in the operating system — specifically, in the part of the operating system that implemented the >> operation. Moreover, whatever the bug was, it depended somehow on the contents of the file on which the >> operation was being run. At this point, I could hand the problem off to the operating-system development group.

The entire experience had a distinct Alice-In-Wonderland feel to it. At first I thought there was something wrong with my program, because the compiler was producing error messages. Then I realized that I could compile the same program twice and have it sometimes compile and sometimes fail to compile, so I suspected that part of the compiler was producing the error messages. Then I realized that when that compiler phase produced its error messages, it was doing so because it was getting incorrect input from the previous phase — which, in turn, was producing incorrect output because of a bug in the operating system. So every time I thought I had found the problem, I was just going one step further into indeterminacy.

This whole episode taught me several important lessons about debugging:

  • Be sure you know what's broken. What looks like a bug in your program might be a bug in the compiler; what looks like a compiler bug might be an operating-system bug.
  • Part of knowing what's broken is proving it. The way to prove it is to come up with a failing test case that is so simple that the problem cannot be anywhere else.
  • Even if a failure looks like it's impossible, it might be happening anyway. In this particular case, even though the compiler was not reading its output file, the operating-system bug made the compiler's success depend somehow on the output file's initial contents.
  • It is possible to learn quite a bit about what is wrong with a program, even if you have neither a debugger nor the program's source code available.

These lessons are worth taking to heart; they have saved me an enormous amount of trouble over the years that would otherwise have been spent chasing bugs that weren't.

Believe it or not, this story is still not over. Next week, I'm going to explain what the operating-system developers told me about the problem once they finally found it. Somehow I doubt you'll be surprised to learn that their first attempt to reproduce the problem failed, despite the simple test case I had given them.

Related Reading






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video