Transmission media, such as telephone lines, wide-area networks, and satellite links, or storage media, like optical/magnetic disk and tape, are usually imperfect. To make digital data transmission and storage reliable, error-correcting codes are indispensable. The efficient Reed-Solomon class of error-correcting codes is highly popular in many applications, including CD, DAT (digital audio tape) players, tape-backup systems, and similar devices. These all use a hardware implementation of Reed-Solomon. However, because microprocessor speeds have increased considerably, software implementations are also feasible in some applications.

In this article, I'll present a highly optimized software implementation of Reed-Solomon error correction, which I implemented while developing the "Video Backup System" for the Commodore Amiga. This product lets users connect a VCR to the Amiga via a hardware interface to use the VCR as a file-backup device. I developed the software that encodes computer data in a video signal and performs error correction on information read back from the VCR to compensate for drop-outs on the video tape. Even though the original Amiga implementation was written in C and 68000 assembly language, the version I'll discuss here is implemented in 32-bit Intel x86 assembly language using Borland C++ 4.2 and TASM 4.0.

### Reed-Solomon Codes

A codeword of a Reed-Solomon (RS) code consists of a sequence of `GF(q).`

`GF(q)`

is a finite field with `q`

elements, where `q=p`

for some prime number ^{m}`p`

. (For an introduction to Galois fields, see Error-Recovery Codes, by Bart de Canne, *DDJ*, December 1994.) Reed-Solomon codes have, by definition, `N=q-`

1. Out of these `N`

, let `K`

be the number of actual information symbols. The other `N-K`

are redundancy symbols. The amount of redundancy in each codeword defines the maximum number of symbol errors that can be corrected. The necessary error-correcting capability of the code can be determined from the quality of the medium (raw error rate) and the desired reliability of the application. Let `T`

be the number of symbol errors that can be corrected in each code word. Because all Reed-Solomon codes are maximum-distance separable, the necessary overhead to correct up to `T`

symbol errors is only 2`T`

symbols. Therefore, the actual information content of each codeword, `K`

, is `N`

-2`T`

symbols. The resulting data rate is: `K`

/`N`

or 1-(2T/N).

Each codeword can be identified with a polynomial over `GF(q)`

in the following way:

Reed-Solomon codes are cyclic. For each codeword `(c`

in a RS code _{0},...,c_{N-1})`C`

, `(c`

is also in _{N-1},c_{0},c_{1},..., c_{N-2})`C`

. Viewing codewords as polynomials, the aforementioned statement can be written as:

A `T`

-error-correcting RS code `C`

can be described by a single unique polynomial `g(x)`

, its generator polynomial. `g(x)`

is a polynomial of degree 2`T`

and the coefficient of `x`

^{2T} in `g(x)`

is 1 by definition. `c(x)`

is a codeword in `C`

only if it is divisible by `g(x)`

. Let be a primitive element of `GF(q)`

. This means that every nonzero element of `GF(q)`

can be written as a power of and . For a RS code,

### Choosing Code Parameters

To maximize efficiency, choose code parameters that match the architecture of the implementation platform as closely as possible. A convenient unit of information on the 32-bit Intel CPU (and most other computer architectures) is a `byte`

which can be stored, retrieved, and manipulated very efficiently. Therefore, in my RS implementation, each symbol is represented by one byte. (For a more complete example of encoding/decoding, see the section at the end of this article entitled "A Complete Reed-Solomon Encoding and Decoding Example.") A byte has 256 possible values, so the most convenient Galois field size is q=256=2^{8}, p=2, and m=8. When shortened RS codes are ignored, this automatically implies that `N`

=255. The only degree of freedom left at this point is the choice of `T`

, the maximum correctable number of symbol errors per codeword. Although the computer architecture does not dictate a value for `T`

, there is a slight advantage if `T`

is a power of 2. For my application, I implemented RS for `T`

=8. Using this as a starting point, it is easy to create your own implementation for other values of `T`

.

### Implementation of GF(2^{8})

GF(2^{8}) can be regarded as the set of polynomials of degree <8, with coefficients in GF(2); for example, 0 or 1. Addition, subtraction, multiplication, and division are all defined operators. In GF(2^{8}), adding two elements is accomplished by adding their corresponding coefficients modulo 2. Since subtraction modulo 2 is the same as addition (1-1 = 1+1 = 0 mod 2, 0-1 = 0+1 = 1 mod 2), subtraction and addition are the same in GF(2^{8}) as well.

In Example 1, the most-significant bit (MSb) in a byte represents the coefficient of x^{7}; the least-significant bit (LSb) represents the coefficient of x^{0}. For example, x^{7}+x^{4}+x^{2} is represented by byte 1001010. The assembly language equivalent of computing the sum of two elements of GF(2^{8}) is the use of the exclusive-or instruction (XOR) on the byte representations of the elements. Multiplication in GF(2^{8}) is done modulo a polynomial `f(x)`

of degree 8. `f(x)`

has to be irreducible over GF(2), which means that it cannot be written as a product of two or more other polynomials over GF(2) of lesser degree. If `f(x)`

is irreducible and every nonzero element of GF(2^{8}) can be written as a power of `x`

modulo `f(x)`

, `f(x)`

is called a "primitive polynomial" and `x`

is a primitive element. Bart de Canne's article contains code for searching for such primitive polynomials (there is always at least one for every Galois field). I use the following primitive polynomial to construct GF(2^{8}): x^{8 }+ x^{6} + x^{5} + x^{4} +1.

##### Example 1: Addition and subtraction in GF(2^{8}).

`(x`

^{4} + x^{2}) + (x^{2} + 1) =(x^{4} + 1)0 - (x^{4} + x) = (x^{4} + x)x + x = 0x - x = 0

In Example 2, my implementation uses a 256×256 byte array called `RSG_multiply`

to efficiently implement multiplication of elements of GF(2^{8}). The Intel architecture has a quirk that makes it efficient to access this array, given two values (`I`

and `J`

) to multiply; of the lower 16 bits of the data registers eax through edx, the CPU can separately access the upper and lower byte. Given that the upper 16 bits of ebx are 0 and that edi points to `RSG_multiply`

, Example 3 is all that is needed to load the product of `I`

and `J`

into al. Understanding this makes the assembly listing easier to read. If you are hard-pressed for registers, you could even free up the register edi by making sure that the `RSG_multiply`

array resides at a 64-KB boundary in memory to which the upper 16 bits of ebx> could point.

##### Example 2: Multiplication in GF(2^{8}) is done modulo f (x)= x^{8 }+ x^{6 }+^{ }x^{5} + x^{4} + 1.

*(x ^{4} + x^{2}) * (x^{2} + 1) = (x^{6} + x^{2})x^{8} = x^{6} + x^{5} + x^{4} + 1x^{255} = x^{0} = 1
*

##### Example 3: Loading the product of I and J.

mov bl,I mov bh,J mov al,[edi+ebx]

Division of an element `a`

by an element `b`

is accomplished by multiplying `a`

by b^{-1}, the multiplicative inverse of `b`

. The array `RSG_multinverse`

contains the multiplicative inverse of every nonzero element of GF(2^{8}). Finally, `RSG_logarithm`

is a logarithm table that tells you which power of `x`

each nonzero GF(2^{8}) element is. For instance, since x^{8} = x^{6} + x^{5} + x^{4}+1, log_{x} (x^{6} + x^{5} + x^{4} +1) = 8 and 01110001_{2} = 71 hex, RSG_logarithm[0x71] equals 8. Listing One, gf256.c, is the code to prepare the arrays needed for GF(2^{8}) arithmetic. These arrays are used by both the C and the assembly language parts of the implementation.

##### Listing One

/****************************************************** * File: gf256.c -- contains code to construct GF(2^8) and all required * arrays for GF(2^8) arithmetic. * Copyright (c) 1995 by Hugo Lyppens with permission to print in DDJ ******************************************************/ #include "hdr.h" #include "rs.h" #define POL 0x171 /* primitive polynomial used to construct GF(2^8) is: x8 + x6 + x5 + x4 + 1 represented as 101110001 binary */ UBYTE RSG_powers[FIELDSZ]; UBYTE RSG_logarithm[FIELDSZ]; UBYTE RSG_multinv[FIELDSZ]; UBYTE RSG_multiply[FIELDSZ][FIELDSZ]; static UBYTE multiply_func(UBYTE, UBYTE); static UBYTE multinv_func(UBYTE); void RSG_ConstructGaloisField() { int v; UBYTE i, j; v = 1; for(i = 0; ; i++) { RSG_powers[i] = (UBYTE)v; RSG_logarithm[v] = i; v <<=1; // multiply v by alpha and reduce modulo POL if(v & (1<<GF_M)) { v ^= POL; } if(i==MAXELT-1) break; } RSG_powers[MAXELT] = 1; // construct multiplication table for(i = 0; ; i++ ) { if(i) RSG_multinv[i] = multinv_func(i); for(j = i; ; j++) { RSG_multiply[i][j] = RSG_multiply[j][i] = multiply_func(i, j); if(j==MAXELT) break; } if(i==MAXELT) break; } } // compute multiplicative inverse of x. The value y for which x*y = 1 static UBYTE multinv_func(x) UBYTE x; { if(!x) return 0; return RSG_powers[FIELDSZ-1-RSG_logarithm[x]]; } // compute product of i and j static UBYTE multiply_func(i, j) UBYTE i, j; { int r, b, k; r = 0; k = j; for(b = 1; b<FIELDSZ; b<<=1, k<<=1) { if(k & (1<<GF_M)) k^=POL; if(i & b) r ^= k; } return((UBYTE)r); }

### Encoding

Again, each codeword is a multiple of the generator polynomial `g(x).`

To encode an information sequence `a(x)`

`= a`

, you could multiply it by _{0}+a_{1}x+...+ a_{K-1}x^{K-1}`g(x)`

to turn it into a codeword. However, it is more efficient to compute `b(x) = a(x)x`

mod ^{N-K}`g(x).`

A possible encoding of `a(x)`

is then `b(x) + a(x)x`

. Yet, I have the intuitive notion that the information symbols should come first and the redundancy symbols should be attached to the end of the data array. Since the RS codes are cyclic, this is no problem and a valid codeword that satisfies this notion can be created by cyclically rotating this encoding to the left by ^{N-K}`N-K`

symbol positions. The final encoding of `a(x)`

then becomes `a(x)+ b(x)x`

.^{K}

As the encoder processes each information symbol in `a(x),`

it maintains `b(x).`

To speed things up, the code uses a table that contains `g(x)`

for each . When the encoder finishes computing `b(x),`

it writes it to the Kth position in the data array to create the final encoding of `a(x).`

Listing Two (genpol.c) calculates the generator polynomial and prepares this table. The function `RSG_encode`

in the program codecsup.asm (available electronically, along with executables; at the top of the first page of this article) is the assembly language implementation of the encoder.

##### Listing Two

/********************************************************************* * File: genpol.c -- contains code to generate the generator polynomial * of degree 2*T for Reed-Solomon encoding/decoding * Copyright (c) 1995 by Hugo Lyppens with permission to print in DDJ *********************************************************************/ #include "hdr.h" #include "rs.h" UBYTE RSG_shiftregtab[FIELDSZ][2*T]; Polynomial RSG_genpol; void RSG_CalcGeneratorPoly() { int i, j; UBYTE a, *p; RSG_genpol.degree = 2*T; memset(RSG_genpol.c, 0, 2*T); RSG_genpol.c[0] = RSG_powers[0]; RSG_genpol.c[1] = 1; for(j = 1; j<2*T; j++) { a = RSG_powers[j]; for(i = 2*T; i>=1; i--) { RSG_genpol.c[i] = RSG_genpol.c[i-1]^multiply(RSG_genpol.c[i], a); } RSG_genpol.c[0] = multiply(a, RSG_genpol.c[0]); } printf("Generator polynomial: "); printpoly(&RSG_genpol); for(i = 0; ; i++) { p = &RSG_multiply[i][0]; for(j = 0; j<2*T; j++) { RSG_shiftregtab[i][j] = p[RSG_genpol.c[j]]; } if(i==MAXELT) break; } }