Channels ▼

Al Williams

Dr. Dobb's Bloggers

The Check(Sum) Is in the Mail

September 30, 2011

There are many reasons you might need to verify that data didn't become corrupted. For example, data that arrives via an external communications channel might require validation. Critical data read from flash memory or some other mass storage device might also need verification. On a big system (that is, a desktop computer), you might use a CRC, but for a small computer this is often overkill.

A popular choice for validating data is a checksum. Sounds simple, right? The sender adds all the data it is sending together and sends the sum along with the data. The receiver computes the same sum and checks to make sure the computed sum matches the received one.

A simple checksum, however, has several important limitations. Usually, the sum is only a limited size, like 8 bits. Even a wider bit width checksum will still overflow when you add enough to it. Using a one's complement sum puts the carry bit back in so that the checksum is more representative of all the input bits.

So in one's complement math (using 8-bit hex numbers), 0xFE + 0x05 = 0x04, which is 0xFE+0x05+carry (note that in normal math, 0xFE+0x05=0x103; the 0x100 is the carry bit for 8-bit math).

Imagine you have an 8-bit checksum on a stream of bytes. To keep it simple, consider a 3-byte stream: 0x80, 0x80, 0x02. The simple sum (if you discard the carry bit) is 0x02. Adding the carry back (one's complement) gives 0x3. A common error might set bad bytes to zero. If the first two bytes are erroneously set to zero, then the simple sum won't notice. The carry wrapping solves that problem.

However, the one's complement method still fails in another common case: What if a major failure causes all received bytes to become zero? Using either method will cause the receiver to compute a zero checksum. Since the error caused all the bytes to read zero (including the checksum itself), the checksum passes. One way to solve that problem is to invert the checksum before you send it. Then a legitimate stream of zeroes will have a checksum of 0xFF. A zero checksum, in that case, is an error.

Turns out, this is exactly the checksum algorithm that IP packets use. The C code below is from RFC1071 and computes a 16-bit checksum using both of these techniques. You can see, the extra code required compared to a normal addition is very minimal, although note that the code depends on not overflowing a 32-bit variable. Instead of wrapping the carry around, this code adds the top part of the variable (which is really just the sum of the carries) to the checksum at the end. Checksums are no replacement for CRCs or other error-detecting and correcting codes. But used correctly, they are a great tool for your bag of tricks.

           /* Compute Internet Checksum for "count" bytes
            *         beginning at location "addr".
       register long sum = 0;

        while( count > 1 )  {
           /*  This is the inner loop */
               sum += * (unsigned short) addr++;
               count -= 2;

           /*  Add left-over byte, if any */
       if( count > 0 )
               sum += * (unsigned char *) addr;

           /*  Fold 32-bit sum to 16 bits */
       while (sum>>16)
           sum = (sum & 0xffff) + (sum >> 16);

       checksum = ~sum;

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.