Channels ▼


Error Correction with Reed-Solomon

The modern high-speed network is not perfect. Errors can creep into message data during transmission or reception, altering or erasing one or more message bytes. Sometimes, errors are introduced deliberately to sow disinformation or to corrupt data. Several algorithms have been developed to guard against message errors. One such algorithm is Reed-Solomon.

This article takes a close, concise look at the Reed-Solomon algorithm. I discuss the benefits offered by Reed-Solomon, as well as the notable issues it presents. I examine the basic arithmetic behind Reed-Solomon, how it encodes and decodes message data, and how it handles byte errors and erasures. Readers should have a working knowledge of Python. The Python class ReedSolomon is available for download.

Overview of Reed-Solomon

The Reed-Solomon algorithm is widely used in telecommunications and in data storage. It is part of all CD and DVD readers, RAID 6 implementations, and even most barcodes, where it provides error correction and data recovery. It also protects telemetry data sent by deep-space probes such as Voyagers I and II. And it is employed by ADSL and DTV hardware to ensure data fidelity during transmission and reception.

The algorithm is the brainchild of Irving Reed and Gustave Solomon, both engineers at MIT's Lincoln Labs. Its public introduction was through the 1960 paper "Polynomial Codes over Certain Finite Fields." Interestingly enough, that paper did not provide an efficient way to decode the error codes presented. A better decoding scheme was developed in 1969 by Elwyn Berklekamp and James Massey.

Reed-Solomon belongs to a family of error-correction algorithms known as BCH. BCH algorithms use finite fields to process message data. They use polynomial structures, called "syndromes," to detect errors in the encoded data. They add check symbols to the data block, from which they can determine the presence of errors and compute the correct valuse. BCH algorithms offer precise, customizable control over the number of check symbols. They have simplified code structures, making them attractive for hardware implementations.

Reed-Solomon is also a linear algorithm because it processes message data as discrete blocks. And it is a polynomial algorithm because of its use of modular polynomials in the encoding and decoding processes.

But Reed-Solomon is not without its issues. It performs poorly with large message blocks. Doubling the block size quadruples the time it takes to encode or decode the message data. You can minimize this limitation by maintaining small and uniform blocks. Reed-Solomon also relies heavily on modular operations: And modular operations, especially multiplication, consume more clock cycles than non-modular ones.

The Mathematics of Reed-Solomon

As stated, Reed-Solomon uses a finite field in its encoding and decoding processes. A finite (or Galois) field is a square matrix, whose elements are the possible bytes values for both message and error data. Its size (2m) is always a power of two, with m being a prime number. It is also commutative. Combining two of its elements with a primitive modular operator (addition, subtraction, and so on) will return another element.

Figure 1 shows a simple finite field, size 4 (m = 2). Notice its largest element is 3 (0b11), which is less than the matrix size. If you take two field elements (0b01 and 0b11) and add them, you get 0b00, which is also a field element.

simple finite field
Figure 1.

Listing One shows how the Python class ReedSolomon prepares its finite fields. Here, it declares two list properties. The property __GFEXP (line 4) is the actual field, with 256 possible byte values. The property __GFLOG (line 7) is the complement field. Its 256 elements are the binary weights of each field element in __GFEXP. Together, these properties help simplify modular multiplication and division as you shall see later.

Listing One
class ReedSolomon:
	# Galois fields
	# -- exponents (anti-logarithms)
	__GFEXP = [0] * 512
	# -- logarithms
	__GFLOG = [0] * 256
	def __init__(self):
		# prepare the exponential and logarithmic fields
		self.__GFEXP[0] = 1
		byteVal = 1
		for bytePos in range(1,255):
			byteVal <<= 1
			if (byteVal & 0x100):
				byteVal ^= 0x11d
			# update the field elements
			self.__GFEXP[bytePos] = byteVal
			self.__GFLOG[byteVal] = bytePos
		# finalize the exponential field
		for bytePos in range(255,512):
			self.__GFEXP[bytePos] = self.__GFEXP[bytePos - 255]
	# ... continued

Next, the constructor method __init__() starts by setting element 0 in __GFEXP to 1 (line 13). Then it populates the first 255 elements in both __GFEXP and __GFLOG (lines 15-22). Finally, it populates the remaining 256 elements in __GFEXP (lines 25-26), by copying the first 256 elements from that same property.

Listing Two shows how the class handles modular multiplication and division. The private methods __gfMult() and __gfDivi() get the same two arguments: argX and argY. The __gfMult() method checks if both arguments are zero and returns a zero result (lines 9-10). Otherwise, it reads an element from __GFLOG, using argX as the list index (line 13). Then it reads an element from __GFLOG, using argY as the index, and adds that element to byteValu (line 14). Finally, it reads an element from __GFEXP, using byteValu as the index (line 15), and returns that element as the result.

Listing Two
class ReedSolomon:
	# ...previous listings
	# Galois multiplication
	# argX, argY: multiplicand, multiplier
	# byteValu: product
	def __gfMult(self, argX, argY):
		# parametre checks
		if ((argX == 0) or (argY == 0)):
			byteValu = 0
			# perform the operation
			byteValu = self.__GFLOG[argX]
			byteValu += self.__GFLOG[argY]
			byteValu = self.__GFEXP[byteValu]
		# return the product result
		return (byteValu)
	# Galois division
	# argX, argY: dividend, divisor
	# byteValu: quotient
	def __gfDivi(self, argX, argY):
		# validate the divisor
		if (argY == 0):
			raise ZeroDivisionError()
		# validate the dividend
		if (argX == 0):
			byteValu = 0
			# perform the division
			byteValu = self.__GFLOG[argX] - self.__GFLOG[argY]
			byteValu += 255
			byteValu = self.__GFEXP[byteValu]
		# return the division result
		return (byteValu)
	# ... continued

The __gfDivi() method also checks for zero arguments. But it returns a zero if argX is zero, and it raises a ZeroDivisionError() if argY is zero (lines 25-30). Otherwise, it uses both argX and argY to read elements from __GFLOG. It subtracts the two elements (byteValu) and adds 255 to the difference (lines 33-34). Then it uses byteValu to access __GFEXP and returns that element as the quotient (line 35).

Next, Reed-Solomon uses polynomials in its encoding and decoding processes. In Python, you can represent a polynomial as a list object (Figure 2). Each element in the list corresponds to a coefficient, each index to a term power. Notice the coefficients of each polynomial term is a hexadecimal number.

list object
Figure 2.

One important polynomial is the generator polynomial (Figure 3). This is a normalized polynomial. All its terms have a coefficient of 1. It is irreducible. It cannot be factored into two or more polynomials.

generator polynomial
Figure 3.

Listing Three shows how the class ReedSolomon prepares a generator polynomial. This private method _rsGenPoly() gets one argument: the number of error symbols (errSize). It assigns the local polyValu a single list element of 1 (line 8). Then it builds the generator polynomial by creating a two-term list object (polyTemp) using __GFEXP (line 11), and combining polyTemp with polyValu using _gfPolyValu() (line 12). The final value of polyValu then becomes the generator polynomial (line 15).

Lisitng Three

class ReedSolomon:
	# ...previous listings
	# Prepare the generator polynomial
	# errSize: number of error symbols
	# polyValu: generator polynomial
	def _rsGenPoly(self, errSize):
		polyValu = [1]
		for polyPos in range(0, errSize):
			polyTemp = [1, self.__GFEXP[polyPos]]
			polyValu = self._gfPolyMult(polyValu, polyTemp)
		# return the polynomial result
		return (polyValu)
	# ... continued

Listing Four shows the four private methods that ReedSolomon uses to process its polynomial list objects. The method _gfPolyAdd() (lines 7-20) combines its two arguments, polyA and polyB, through modular addition. The method _gfPolyMult() (lines 25-36) combines its two arguments through modular multiplication. Notice addition is done with exclusive-or, while multiplication is done with __gfMult().

Listing Four

class ReedSolomon:
	# ...previous listings
	# Polynomial addition
	# polyA, polyB: polynomial addends
	# polySum: polynomial sum
	def _gfPolyAdd(self, polyA, polyB):
		# initialise the polynomial sum
		polySum = [0] * max(len(polyA), len(polyB))
		# process the first addend
		for polyPos in range(0, len(polyA)):
			polySum[polyPos + len(polySum) - len(polyA)] = polyA[polyPos]
		# add the second addend
		for polyPos in range(0, len(polyB)):
			polySum[polyPos + len(polySum) - len(polyB)] ^= polyB[polyPos]
		# return the sum
		return (polySum)
	# Polynomial multiplication
	# polyA, polyB: polynomial factors
	# polyProd: polynomial product
	def _gfPolyMult(self, polyA, polyB):
		# initialise the product
		polyProd = len(polyA) + len(polyB) - 1
		polyProd = [0] * polyProd
		# start multiplying
		for posB in range(0, len(polyB)):
			for posA in range(0, len(polyA)):
				polyProd[posA + posB] ^= self.__gfMult(polyA[posA], polyB[posB])
		# return the product result
		return (polyProd)
	# Polynomial scaling
	# argPoly: polynomial argument
	# argX: scaling factor
	# polyVal: scaled polynomial
	def _gfPolyScale(self, argPoly, argX):
		# initialise the scaled polynomial
		polyVal = [0] * len(argPoly)
		# start scaling
		for polyPos in range(0, len(argPoly)):
			polyVal[polyPos] = self.__gfMult(argPoly[polyPos], argX)
		# return the scaled polynomial
		return (polyVal)
	# Polynomial evaluation
	# argPoly: polynomial argument
	# argX: independent variable
	# byteValu: dependent variable
	def _gfPolyEval(self, argPoly, argX):
		# initialise the polynomial result
		byteValu = argPoly[0]
		# evaluate the polynomial argument
		for polyPos in range(1, len(argPoly)):
			tempValu = self.__gfMult(byteValu, argX) 
			tempValu = tempValu ^ argPoly[polyPos]
			byteValu = tempValu
		# return the evaluated result
		return (byteValu)	
	# ... continued

The next method, _gfPolyScale(), takes two arguments: a polynomial (argPoly) and an integer (argX). It multiplies each polynomial term by argX using __gfMult() (lines 47-48). The method _gfPolyEval() also gets argPoly and argX as its arguments. However,it solves the polynomial argPoly using __gfMult() to combine argX with each term, and tallies the sum using exclusive-or (lines 62-65).

Encoding with Reed-Solomon

To encode a message block with Reed-Solomon, first you need to set the number of error symbols (errSize). These symbols are generated by Reed-Solomon and appended to the message block. They are later used to correct any erasures or errors found in the block.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.



Yes, it does appear a follow-up article is in order.


Fair enough, perhaps this can serve as food for thought for a follow-up article. Incidentally, we did run an article in the Journal back in the day (I think it was an entry in our Algorithm Alley column) that stepped through the algorithm with a little more mathematical detail. It's available at if you're interested.


This article was disappointing. It would have been much more useful if it just offered a working implementation with examples. However, it can be greatly improved if the author is so inclined. To that end, I offer the following constructive criticism.

There is almost nothing to explain how the algorithm works. Instead, it launches into showing code and describing, in English, the logic that is present in Python. Missing from the code discussions are semantics and reasons. For example, the discussion of Listing Three doesn't explain anything about the polynomial being generated from the errSize argument. What do different values mean and what is the result in each case? How do the created polynomials affect the rest of the algorithm? Why is the resulting polynomial necessary and correct for the purpose? Another problem is that the code is presented bottom up. Details are presented first, before they are even needed. Far better is to describe the interface and then build up the needed details as their need is made obvious.

Even when using figures to try to illustrate the encoding and decoding operations, you offer no explanation of why the values move and change as they do in the buffer. I suggest using the figures you have to walk through the example, explaining how the algorithm reacts to the input and generates the encoded output, each step of the way, and then do the same for decoding. Having thus illustrated the algorithm in action, show how the errSize value affects things. Finally, show the code, with discussion that ties it back to the behaviors illustrated and described early in the article. That will make a great deal more sense.


In the discussion of the generator polynomial, you state that all of its terms have a coefficient of 1, yet the polynomial in Figure 3 doesn't. Only the term with the highest degree has a coefficient of 1.


Gus Solomon was a friend of mine. He often groused about not ever getting a dime from the CD industry. (This was before DVDs, etc.)