Dr. Dobb's | An Algorithm for Error Correcting Cyclic Redundance Checks

An Algorithm for Error Correcting Cyclic Redundance Checks

A straightforward technique to leverage the error-correcting capability inherent in CRCs.

June 01, 2003
URL:http://www.drdobbs.com/an-algorithm-for-error-correcting-cyclic/184401662

An Algorithm for Error Correcting Cyclic Redundance Checks

Programmers have used the Cyclic Redundance Check (CRC) algorithm for years to uncover errors in a data transmission. It turns out that you can also use CRCs to correct a single-bit error in any transmission. I first heard about error correcting CRCs in a conversation I had several years ago [1]. At the time, I thought this feature of CRCs was general knowledge, but as I did more research, I saw no mention of CRC error correction in the popular literature. The traditional response to a CRC error is re-transmission. However, the advance of computer technology has led to some situations where it is actually preferable to correct single-bit errors rather than to resend. Some examples include:

Satellite transmission -- If a host is sending data via a satellite, the cost of sending a regular packet is high, so the cost of a resend just doubles the price for the packet.
High-speed transmission -- In the future, there may be a tendency to push the technology. (Let's crank this baby up and see what it will do.)The faster bits move through a medium, the higher the probability of error.
PowerLine Carriers -- Metricom Corporation, a supplier of integrated circuits for computer applications states, "There is a growing interest in the use of PowerLine Carrier (PLC) for data communication using the intrabuilding electric power distribution circuits. Power lines were not designed for data communications and exhibit highly variable levels of impedance, signal attenuation and noise... Harmful effects of impulse noise on data communications systems can be expected." [2].

You could also use CRC error correction for storage devices -- both hard disk and RAM -- and for compression programs. The way compression programs are written now, it is often difficult to recover the original data if one bit is lost. Bit errors typically occur in bursts. Tannenbaum describes a method for recovering from burst errors that lends itself to a 1-bit error correction technique such as the technique I describe in this article (see the sidebar titled "Correcting Burst Errors").

Error Correcting CRCs

The algorithm for error correcting CRCs involves determining the remainder after dividing in binary (modulo 2). The algorithm is table driven. The length of the message is determined, a priori, and agreed to by both the sender and the receiver. A generator polynomial value that can be used to determine the position of errors anywhere in the message is also selected at that time.

Before a message is sent, a checksum is calculated and appended onto the end of the message. The message and checksum are then sent. The receiver gets the message and calculates a second checksum (of both parts). If the new checksum value is 0 then the message is considered valid. If not, the checksum value is used as a subscript in an error correction table to determine the position of the bad bit. This method will find and correct 1-bit errors. The receiver builds the error correction table using the chosen Generator Polynomial (GP). The GP has to be carefully selected, because different GPs have different characteristics. I discuss the selection of an error correcting GP later in this article.

Building the Tables

To build the error correction table, I begin with a series of 0s representing correct data. One of the bits is then set to 1. Using modulo 2 division (exclusive-or), the receiver divides the message by the GP, calculating the remainder that is used as a subscript in the error correction table. See Algorithm 1.

Algorithm 1:

for (i = 1; i<=Message_Length; i++)
  {
    set all bits in the Message to 0
    change the i'th bit to 1
    calculate the checksum (cs)
    EC_table[cs] = i
  }

You only have to build this table once for each generator polynomial.

Building the EC Table for 1011

In the example that follows, I have chosen a 4-bit generator polynomial. With 4-bit GPs, only two GP values can be used for error correction--decimal values 11 and 13. I chose 11. When one divides the message by the 4-bit GP, the result is a 3-bit remainder, yielding values from 0 to 7. This result implies that I can use this GP for a message with a total length from 1 to 7 bits. (Four bits for the message and 3 bits for the checksum). The result of the calculation of the checksums is shown in Table 1. At the bottom of the table are the decimal values of the remainders after the division.

These values are then used to fill the Error Correction (EC) table. The EC table, in checksum order, is shown in Table 2. You could initialize an array in C with these values. Usually, EC[0] is not used.

int EC[] = {0,7,6,4,5,1,3,2};

This process makes up the test for error correcting generator polynomial validity. For a given number, if the entire table cannot be built (i.e., if two or more numbers map to 1 slot), the number chosen cannot be used as an error correcting generator polynomial. In practice, when data is sent, the sender computes the checksum (cs) and then appends the checksum onto the end of the data. When the receiver receives the message combined with the checksum, the receiver computes another checksum (cs2). If the second checksum is zero, the data is (supposedly) ok. If cs2 is NOT zero, EC[cs2] contains the location of the bit in error.

Sending a Message with GP = 1011

Start with the following generator polynomial: 1011. Assume the Original Data is 1100. First append 3 additional bits (with value 000) on the end. Then calculate the checksum (using exclusive-or):

The resulting checksum is: 010 and the string that is sent is the original data appended with the checksum: 1100010. Suppose the following string of bits is received 1101010. (Numbering from left to right, bit 4 has changed.) The receiver computes the checksum.

The checksum is binary 011, which is 3 in decimal. EC[3] is 4, which is the position of the bad bit! (Refer to Table 2) You can use this scheme for much longer messages, but the preceeding example demonstrates the essence of the algorithm. I wrote a program that determines whether a given bit pattern can be used as an error correcting GP (see Listing 1). None of the existing widely used GPs work for error correction (see the sidebar titled "Generator Polynomials").

There are ways of finding the bad bit without using tables. The following code is a slight modification of an algorithm presented by Fred Halsall [4] for computing an 8-bit CRC.

Algorithm 2:

    hpo2 = 8;
    v = 1;
    for (i = hpo2-1; i>=1; i--)
      {
        if (v == checksum)
          bad_bit_position = i;
        v = v + v;
        if (v >= hpo2) v = v ^ GP;
      }

If you have a lot of packets to check, table lookup is faster.

Using a Finite State Machine

Intrigued by the idea of CRC error correction, I looked at the method from different angles, and, as a result, I discovered a method for error correcting CRCs using finite state methods. Given the following generator polynomial: 11 (1011 in binary), I first determine the value of the left-most 1 bit (8) in the GP. I call it hpo2, and it is equal to the highest power of 2 in the GP. Then I build a Finite State Table (FST) for GP = 1011. This table represents states of the machine. The table has hpo2-1 rows and 2 columns (0 and 1). The procedure for building an FST is as follows: Let t equal the current row number. The row number represents the current state. Double it. (t = t * 2) Add the current column number (zero or one) to t. If the value of t is >= hpo2, then exclusive-or it with the GP. Place the value of t into the current row and current column of the table. The following C code computes the values in the FST.

Algorithm 3:

for (r = 0; r < hpo2; r++)
  for (c = 0; c < 2; c++)
    {
      t = r*2 + c;
      if (t >= hpo2) t ^= GP;
      FST[r][c] = t;
    }

For GP = 1011, this algorithm produces the finite state table shown in Table 3.

Both the sender and the receiver must build the FST. Only the receiver builds the error correction table and does so using the following code:

Algorithm 4:

  pos = 7;
  cc = 1;
  for (r = 0; r < hpo2; r++)
    {
      EC[cc] = pos;
      pos--;
      cc *= 2;
      if (cc >= hpo2) cc ^= GP;
    }

The result is the CRC error correction table for 1011 and is equivalent to Table 2.

Correcting a 1-bit Error Using an FST

An example sequence of sending a message, losing a bit, then recovering the bad bit follows:

Algorithm 5:

  Sender
    Given an original message, say 1100
    Append zeros for the checksum on the end: 1100000.
    Traverse the FST...

       cs = 0;
       for (i=1; i < hpo2; i++)
         cs = FST[cs][msg[i]];

    ... giving a 2 (010 in binary) for the checksum (cs).
    Send the message with a binary 2 appended: 1100 010


  Receiver
    Receive a message with 1 bit changed (say bit 4): 1101 010
    Traverse the FST to get a value (cs2) of 3
    if cs2 = 0 then the message is ok
    else
        EC[3] = 4, therefore bit 4 is bad
        Correct the bad bit: 1100 010
    Discard the checksum: 1100

A Sample Program

Listing 2 is a sample program that builds the tables, generates a random string, and changes a random bit. Part 2 of the program determines the position of the changed bit and corrects it. (The code is written with the message and checksum in an array of int. This approach demonstrates the logic while making the code much simpler to read.)

References

[1] Conversation with Maartin van Sway, now Professor Emeritus at Kansas State University.

[2] Direct quote from "http://www.metricom-corp.com/fec.html"

[3] Andrew S. Tanenbaum, Computer Networks, Prentice-Hall. 1996.

[4] Fred Halsall Data Communications, Computer Networks and Open Systems, Addison-Wesley, 1996, (p. 137).

About the Author

Bill McDaniel received his Ph.D. from Kansas State University and is currently the Chair of the Department of Computer Science at the University of Central Oklahoma. His interests include networking, encryption, CGI programming, and operating systems. He is the author of the article "Efficiently Sorting Linked Lists," which appeared in the June 1999 issue of Dr. Dobbs Journal.

Finding GPs Listing 1: Finding GPs

/* The following program checks numbers to see if they are valid crc's
 * by counting the number of cycles from 1 to 1.
 */
#include <stdio.h>
#include <string.h>

unsigned long int gp, i, j, s, t, v, hpo2;
FILE *infile, *outfile;

int main()
  {
    char lkgp[100]; /* last known generator polynomial */
    char fn[] = {"crc_li3.in"};

    /* open up the file of already known GPs */
    if ((infile  = fopen(fn,"r")) == NULL)
      {
        printf("%s not found",fn);
        return 1;
      }

    /* put the newly discovered gp's into file x.x */
    if ((outfile = fopen("x.x", "w")) == NULL)
      {
        printf("Error:");
        return 1;
      }

    /* find the last known gp */
    while (fgets(lkgp,100,infile)!=NULL);
    fclose(infile);
    printf("lkgp = <%s>\n",lkgp); 

    /* convert the string to a long int */
    s = 0;
    for (i=0; i < strlen(lkgp); i++)
      {
        if (lkgp[i] != ' ')
        if (lkgp[i] != 13)
        if (lkgp[i] != 10)
          {
            s = s*(long long int)10;
            s = s + lkgp[i] - '0';
          }
        printf("%ld\n",s);
      }

    if (s%2 == 0) s++;  /* all GPs are odd */

    /* check the next 10000 numbers for valid gp's */
    for (gp = s+2; gp <= s+10000; gp+=2)
      {
        printf("testing  %lld\n",gp);
        t  = gp;

        /* determine the highest power of 2 in the gp */
        t = gp;
        hpo2 = 1;
        while ((t >> 1) > 0)
        { hpo2 = hpo2 + hpo2; t = t / 2; }

        /* see how many times v cycles around from 1 to 1 */
        v = 1;
        i = 1;
        do
          {
            v = v + v;
            if (v >= hpo2) v = v ^ gp;
            i = i + 1;
          }
        while (v != 1);

        /* if the number of cycles is == hpo2 then it is an
                acceptable
         * generator polynomial and can be used for error
                correction.
*/
        if (i == hpo2)
          {
            printf("%ld is an error correcting gp\n",gp);
            fprintf(outfile," %ld\n",gp);
          }
      }
  fclose(outfile);
  return 0;
}

Listing 2: Sample error detection and correction Listing 2: Sample error detection and correction

/* crc_fsa.c */ 
#include  <stdio.h>
#include  <stdlib.h>
#include  <time.h>
 
#define  gp   285      /* the generator polynomial */ 
#define  hpo2 256      /* highest power of 2 in the gp */ 
 
int main() 
{ 
  int r,c, t, cs, cs2, pos; 
  int i, bb, li, 
      bim,        /* bits in the message */ 
      btbc,       /* bit to be changed */ 
      bigp,       /* bits in the gp */ 
      debug = 0; 
  int msg[hpo2];  /* the actual message */
  int fsa_table[hpo2][2]; 
  int checksum[hpo2]; 
  time_t tt; 
 
  /* initialize the random number generator */ 
  srand((unsigned) time(&tt)); 
 
  /* determine the number of bits (bigp) in the gp */ 
  li = gp >> 1; 
  bigp = 1; 
  while (li) { li = li >> 1; bigp++; } 

  /* determine the maximum number of bits in the message */ 
  bim = hpo2 - bigp; 
 
  printf("generator polynomial          %2d\n",gp); 
  printf("highest power of 2 in gp      %2d\n",hpo2); 
  printf("bits in generator polynomial  %2d\n",bigp); 
  printf("bits in message               %2d\n\n",bim); 
 
  /* Both the sender and the receiver need to build the FST */ 
  for (r = 0; r < hpo2; r++) 
    for (c = 0; c < 2; c++) 
      { 
        t = r*2+c; 
        if (t >= hpo2) t ^= gp; 
        fsa_table[r][c] = t; 
      } 
 
  /* display the FST */ 
  if (debug) 
    { 
      printf("FST table for gp = %d\n\n",gp); 
      printf("index     0     1\n"); 
      printf("        -----------\n"); 
      for (r = 0; r < hpo2; r++) 
        { printf("   %2d  | ",r); 
          for (c = 0; c < 2; c++) printf("%2d  | ",fsa_table[r][c]); 
          printf("\n"); 
        } 
    } 
 
  /* Only the receiver will need to build the Error Correction 
     table for the GP */ 
  for (pos=hpo2-1; pos > 0; pos--) checksum[pos]=0; 
  t = 1; 
  for (pos=hpo2-1; pos > 0; pos--) 
    { 
      if (checksum[t]) printf("%d is not a valid gp\n",gp), exit(0); 
      else checksum[t] = pos; 
      t *= 2; 
      if (t >= hpo2) t ^= gp; 
    } 
 
  /* display the error correction table for the gp */ 
  if (debug) 
    { 
      printf("\n\nError Correction table for gp = %d\n",gp); 
      printf("index   position of\n"); 
      printf("        the bad bit\n"); 
      printf("            --- \n"); 
      for (r=1; r < hpo2; r++) 
        printf("    %2d     | %2d |\n",r,checksum[r]); 
      printf("\n"); 
    } 
 
  /* generate a random message */ 
  for (i=0; i <= bim; i++) msg[i] = rand()%2; 
 
  /* display the message */ 
  printf("The original message         "); 
  for (i=1; i <= bim; i++) printf("%d",msg[i]); 
  printf("\n"); 
 
  /* zero out then compute the checksum */ 
  for (i=bim+1; i < hpo2; i++) msg[i] = 0; 
  cs = 0; 
  for (r = 1; r < hpo2; r++) 
    cs = fsa_table[cs][msg[r]]; 
 
  /* Put the checksum on the end of the message */ 
  for (i = bim+1; i < hpo2; i++) msg[i] = (cs >> hpo2-i-1) & 1; 
 
  /* display the message and the checksum */ 
  if (debug) 
    { 
      printf("with the checksum            "); 
      for (i=1; i < hpo2; i++) 
        printf("%d",msg[i]); 
      printf("\n"); 
    } 
 
  /* The message|checksum can now be sent.  Simulate a problem by 
     changing a random bit */ 
  btbc = rand()%(hpo2-1)+1; 
  msg[btbc] ^= 1; 
 
 /* 
  * Part 2 
  * The receiver has received a message with 1 bad bit. 
  */ 
  printf("After receiving the message, msg = "); 
  for (r = 1; r < hpo2; r++) printf("%d",msg[r]); 
  printf("\n"); 
 
  /* compute the checksum */ 
  cs2 = 0; 
  for (r = 1; r < hpo2; r++) 
    cs2 = fsa_table[cs2][msg[r]]; 
 
  if (debug) printf("the checksum is %d\n",cs2); 
 
  if (cs2 > 0) 
    { 
      /* determine the bad bit */ 
      pos = checksum[cs2]; 
 
      /* correct the bad bit */ 
      msg[pos] ^= 1; 
    } 
 
  /* display the (corrected) message */ 
  printf("The corrected message        "); 
  for (i=1; i <= bim; i++) printf("%d",msg[i]); printf("\n"); 
 
  printf("Normal Termination\n"); 
}

Correcting Burst Errors Correcting Burst Errors

In practice, bit errors usually occur in bursts. Tanenbaum [3] describes a method that can be used to recover from burst errors. The method is as follows: The data to be transmitted is stored horizontally (like we would read it), but the data is transmitted vertically (bit 1 in each row would be transmitted first, then bit 2 in each row, and so on).

The receiver then reconstructs the original lines upon reception of the data. If a burst error occurs, there will be only 1 bit wrong per line, and we have shown that 1 error bit per line can be corrected.

Generator Polynomials Generator Polynomials

The following is a list of some of the smaller error correcting generator polynomials. There are thousands more.

generator            bits      bits      total   hpo2
polynomial            in        in       bits
                   message   checksum
decimal     binary
    7        111       1        2          3       4
   11       1011       4        3          7       8
   13       1101       4        3          7       8
   19      10011      11        4         15      16
   25      11001      11        4         15      16
   37     100101      26        5         31      32
   41     101001      26        5         31      32
   47     101111      26        5         31      32
   55     110111      26        5         31      32
   59     111011      26        5         31      32
   61     111101      26        5         31      32
   67    1000011      57        6         63      64
   91    1011011      57        6         63      64
   97    1100001      57        6         63      64
  103    1100111      57        6         63      64
  109    1101101      57        6         63      64
  115    1110011      57        6         63      64

Table 1: Calculation for checksum for 1011 Table 1: Calculation for checksum for 1011

1000000   0100000   0010000   0001000   0000100   0000010   0000001 
1011      1011      1011      1011      1011      1011      1011 
0011000   0100000   0010000   0001000   0000100   0000010   0000001 
 1011      1011      1011      1011      1011      1011      1011 
0011000   0001100   0010000   0001000   0000100   0000010   0000001 
  1011      1011      1011      1011      1011      1011      1011 
0001110   0001100   0000110   0001000   0000100   0000010   0000001 
   1011      1011      1011      1011      1011      1011      1011 
0000101   0000111   0000110   0000011   0000100   0000010   0000001 
    
      5         7         6         3         4         2         1

Table 2: The Error correction table for 1011 in checksum order Table 2: The Error correction table for 1011 in checksum order

checksum  bad bit 
      1      7 
      2      6 
      3      4 
      4      5 
      5      1 
      6      3 
      7      2

Table 3: The finite state table for 1011 Table 3: The finite state table for 1011

  index    values 
            0  1
         ---------- 
    0   |   0  1 
    1   |   2  3 
    2   |   4  5 
    3   |   6  7 
    4   |   3  2 
    5   |   1  0 
    6   |   7  6 
    7   |   5  4