Channels ▼
RSS

Parallel

Just Say "No!" to Overflow


Bil has written three books on multithreaded programming along with the GNU Emacs Lisp manual and numerous articles. He is currently a computer scientist at the Broad Institute at MIT, working on BioInformatics. Bil can be contacted at Bil@LambdaCS.com.


If you had $2,147,483,647 in the bank, you'd be pretty upset if somebody gave you a buck and the bank told you that your balance was now -$2,147,483,648. So why are we programmers so sanguine about that behavior in our programs?

The issue, of course, is that because we generally use fixed-width words to represent numbers, there is a limit to the number of numbers we can have. For example, if we use 4-bit words and twos-complement arithmetic, then we can have 24=16 different numbers—seven positive, eight negative, and zero. If we ignore overflow and add 0111+0100(7+4), we get 1011(-5).

Historically, it makes sense to have built machines with simple add instructions that just set an overflow bit. Hardware was expensive, there are lots of times when you're bit-twiddling that you want that kind of instruction, and 32 bits gave you enough range for almost anything you wanted to do. It actually worked pretty well. So languages like C just used what the hardware gave them and left the rest up to the programmer.

But that's not what we want. We want data structures that act like integers and we don't want to worry about word size or any other low-level detail. We also want our programs to run like the wind and run identically on lots of different platforms. And that's a challenge.

I worked at Sun Microsystems when Java was being developed, and I asked James Gosling why he allowed silent overflow in Java. He explained that it was so existing C programs could be easily ported and all the tricky bit twiddling hacks they used would work in Java. I think he blew it. There's nothing wrong with bit twiddling. There's nothing wrong with having an operator that does silent overflow. What's wrong is that bit twiddling is what you do with bit arrays, not numbers, and none of our languages do a good job of separating the two. (I would like to have a bit array data type that could be cast to a number.)

We know how to build fast hardware that will give us unlimited integers, but none of the major CPU designs (x86, PowerPC, SPARC, MIPS) have gone that way. Lisp implementations generally reserved the bottom two bits of a word to indicate type. 00 indicated a 30-bit integer, 01 a 30-bit float, and 11 a pointer. This way if an operation overflowed, they could "box" the number and get slower, but correct behavior for any size number. A huge percentage of programmers are wed to using 32-bit ints and they are not likely to change (even though they probably should). I believe that 32-bit ints are with us for a long time.

Given this situation, I think we should require high-level language programmers to indicate the permissible range of a variable (declaring int, long, and so on, is one way) and allow the runtime system to throw an exception whenever it goes outside it. It's far better for a program to abort on overflow, than to silently give the wrong results. IMHO.

In concert with this, hardware designers should include an "integer add, trap on overflow" instruction in future machines, so that programs can continue to run as fast as ever. Until they get around to it, our languages (compilers and runtime systems) should have a "strict integer" flag that generates potentially slower code that checks all operations during testing, and generate faster, nonchecking code for production.

More than anything else, we need to change our attitudes about computing from "What can we do with what we have?" to "What do we want to do?

That's what I think. What do you think?


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video