## An encoder and a Viterbi decoder in C++

*Hugo is a quantitative analyst at the firmwide risk department of Goldman, Sachs & Co. He can be reached at [email protected] or http://www.stack.nl/~hugo/.*

In my article entitled "Reed-Solomon Error Correction," (*DDJ*, January 1997), I discussed the Reed-Solomon class of error-correcting codes. In this article, I'll examine a different class of codes -- convolutional error-correcting codes. Reed-Solomon codes are block codes, which means they encode a fixed-length sequence of message symbols into a fixed-length codeword. Longer messages are spread over multiple codewords. Convolutional codes, on the other hand, encode an unlimited number of message symbols into one codeword and support "soft-decision" decoding. A technique, in which the receiver simply distinguishes between 0 and 1 without providing any bit reliability information is called "hard-decision" decoding; this is the technique I will be using in this article.

As an example of soft-decision decoding, assume that the transmitter sends 0s and 1s and that the receiver uses an 8-level digitizer (0-7) to read symbols. Values from 0-3 would most likely result from a transmitted 0 and 4-7 would most likely result from a 1. But 7 is a "strong" 1 and 4 is a "weak" 1, which might even have been transmitted as a 0. A convolutional decoder can use this kind of information on bit strengths to find the most likely transmitted message sequence. Convolutional codes are extremely practical. NASA and ESA developed a Deep Space Error Control Coding Standard based on convolutional codes with constraint length 7 and rates 1/2 and 1/3 for transmissions from spacecraft to Earth. This standard joins the rate-1/2 convolutional code to a (255, 223) Reed-Solomon code to form a concatenated coding system. This means that each message sent to Earth from the spacecraft is encoded in two stages. In the first stage, the message is encoded using a Reed-Solomon encoder. The resulting stream is then fed to the second stage, a convolutional encoder. At the ground station, transmission errors that were severe enough to cause a decoding error in the convolutional code will be corrected by the Reed-Solomon decoder, which can correct 16 symbol errors per codeword of 255 symbols.

Figure 1 is a 3-bit shift register that is initially filled with zeros. A potentially infinite input string is fed into the encoder, one bit at a time. Using the two exclusive OR gates (shown as ), the encoder produces two output bits for each input bit. In this case, the rate of the code is 1/2. In general, a convolutional encoder that takes *k* input bits and produces *n* output bits is said to have a rate of *k/n*.

The first output is connected to the shift register according to the bit pattern *g*^{(0)}=(1101). The connection pattern to the second one is *g*^{(1)}=(1011). When the input string *i *is (10011), the encoder will produce the output streams *o*^{(0)}=(11000) and *o*^{(1)}=(10101). The individual streams can be interleaved to form a single output stream, (11,10,01,00,01).

The Hamming distance between two bit sequences of equal length is the number of places where the two sequences differ. This distance metric is used in hard-decision decoding. Soft-decision decoding requires more elaborate metrics based on bit-reliability information. One example of a soft-decision metric is the sum of the squared distances between the received signal levels of corresponding bits in both words.

*k* is the number of input bits the encoder accepts simultaneously and is equal to the number of parallel shift registers within the encoder. *n* is defined as the number of output streams produced by the encoder and is equal to the number of exclusive OR gates in the encoder.

The constraint length of a convolutional code is the maximum number of bits in a single output stream that can be affected by any input bit. In practice, the constraint length is the length of the longest shift register +1. For notational convenience, we define *m* as 1 less than the constraint length. So in Figure 1, the constraint length is 4 and *m*=3.

In Example 1(a), *o _{j(n)}* denotes the

*j*th bit of output stream

*#n, i*is the

_{j}*j*th bit of the input stream, and

*g*is a bit that defines whether there is a connection at position

_{j(n)}*j*between output

*#n*and the shift register. Example 1(a) mathematically relates the output stream

*o*to the input stream

*i*and the encoder connection pattern

*g*. To see why the codes are called "convolutional codes," note that this is equivalent to the convolution in Example 1(b).

If *k*, the number of simultaneous input bits, is greater than 1, the encoder will consist of *k* parallel shift registers. The XOR gates for each of the *n* output bits can be connected to multiple positions in each of the shift registers. Figure 2 shows an example of an encoder where *k*=2 and *n*=3.

In this generalized case, the connection pattern *g ^{(x,t)},*0<=

*n*^ 0<=

*t*<

*k*describes the connections from shift register

*t*to the XOR gate for output stream

*#x*; see Example 2(a). Example 2(b) illustrates what the convolution becomes.

When the message is finite, you usually let the encoder continue for another *m* steps after the message has been processed to clear the shift registers back to the all-zero state. This gives the decoder the information it needs to correctly decode the last message digits. For example, when the encoder of Figure 2 receives the interleaved input stream (00,10,11,00,01), it will produce the interleaved output stream (000,110,001,111,010,011,001).

### State Diagrams

Since the output of the encoder only depends on the input and the current contents of the shift registers, the operation of the encoder can be captured in a state diagram. The contents of the shift registers define the state of the encoder. The state diagram shows all states and all possible transitions between them. Each transition has a label of the form *X/Y*, where *X* stands for the *k* input bits that cause the transition, and *Y* is the resulting string of *n* encoder output bits. For example, when the encoder in Figure 1 is in state (101) and receives a 0, it will emit (01) and move to state (010). (Figure 4 is the state diagram encoder.)

Since an encoder has *k *shift registers with *m *bits each, the total number of states is 2^{km}. The encoder in Figure 2 has *k*=2 and consists of two shift registers. You can represent the state of this encoder with a single sequence of digits by interleaving the bits from the two shift registers. For example, it is in state (1010) when the first shift register contains (11) and the second one contains (00). This representation of states is advantageous because a new state can easily be formed by shifting the old state to the right by *k* positions and adding the *k* input bits to the left. For instance, when the old state is (1101) and the input bits are (00), the new state is (0011).

Since the encoder is in the all-zero state both before and after the transmission of a codeword, each convolutional codeword corresponds to a path through the state diagram that starts and ends in the all-zero state.

The minimum free distance of a convolutional code is the smallest distance between distinct output sequences. Because of linearity, this corresponds to the minimum nonzero weight in the code, which we can determine by finding the minimum weight of any path in the diagram that leaves the zero state and returns to it.

### Catastrophic Codes

Figure 3 is a state diagram of a convolutional code with *k*=1, *n*=2 and constraint length 4. Suppose you transmit the all-zero codeword as (00,00,00,00,00,...). Due to transmission errors, you receive (11,10,00,00,00,...). This is actually a valid codeword, corresponding to an input of (110110110110...). Since this input differs in infinitely many places from the original word, you conclude that in this case, a finite error pattern caused decoding errors in an infinite number of places. The problem is caused by the fact that, besides the loop around the all-zero state, this code's state diagram has another loop (011 -> 101 -> 110 -> 011) in which each transition generates an all-zero output.

### The Viterbi Decoding Algorithm

The Viterbi algorithm can be seen as a variation of the "shortest-route" problem from the field of operations research. Essentially, the algorithm determines the most likely transmitted message by finding the path through the state diagram for which the corresponding output sequence has the lowest distance to the received sequence. I will refer to this distance as "the weight of the path."

For decoding a received sequence, the trellis diagram of the encoder will play a key role. A trellis diagram, like Figure 5, is an extension of the encoder's state diagram that shows the passage of time. Figure 5 is based on the message (10111000), which was encoded into (11,10,10,10,00,00,10,11). Furthermore, you assume that transmission errors occurred in the first two bits, so that the received sequence is (00,10,10,10,00,00,10,11).

The labels on the edges in this diagram show *Z(t,s)* for every state *s* and time *t*. *Z(t,s)* is the weight of the lowest-weight path through the trellis diagram from the all-zero state to state *s* at time *t*. In this particular diagram, the lowest-weight path to state (101) at time 5 has weight 3.

In the state diagram in Figure 4, you can see that each state has 2 incoming edges (in general, a state has 2^{k} incoming edges). However, in the trellis diagram, only one incoming edge is shown: the one that minimizes *Z*, the weight of the path. This edge is called the survivor. If multiple edges yield the same minimum *Z*, one of them will be selected arbitrarily as the survivor.

For example, consider state (101) at time 5. The state diagram shows incoming edges from states (010) and (011). At time 4, the lowest weights of the paths to states (010) and (011) are 2 and 4, respectively. The received sequence at time 5 is (00). The encoder output from state (010) to (101) is (10), which has distance 1 to the received sequence. For the edge from (011) to (101), the output is (00) with distance 0. Clearly, 2+1<4+0, so the survivor at this point is the edge from state (010) at time 4 to state (101) at time 5.

The path that reaches state (000) at time 8 corresponds to the codeword with the lowest distance to the received sequence. To determine the most likely transmitted sequence, all you need to do is to trace back through the trellis diagram from *t*=8 to *t*=0 (this trace is shown in bold in the diagram). The trace visits the states (000,100,010,101,110,111,011,001,000) and the corresponding decoded message is (10111000).

In a real application, the codewords are likely to be much longer than in this example. This makes it impractical to build a trellis diagram for the entire length of the received sequence. Instead, you use a modified version of the algorithm called "truncated Viterbi decoding." First you introduce a constant , the decoding depth. This is the number of symbols ahead of the current symbol used by the decoder to build a trellis diagram. When the truncated Viterbi decoder is decoding a symbol for time *t*, it uses a trellis diagram based on the received sequence from time *t* to time *t*+.

The fact that the decoder is only looking ahead symbols instead of considering the entire codeword may cause decoding errors. Research has indicated that the probability of such decoding errors becomes negligible with a decoding depth of at least 5.8*m*.

The Viterbi algorithm is equally suitable for soft-decision decoding. Instead of computing *Z* as a path weight based on Hamming distance, we would use an appropriate soft-decision distance metric.

Figure 6 summarizes the truncated Viterbi decoding algorithm and lists all its steps.

From this summary, it follows that for each decoded symbol, the decoder needs to iterate over all states in order to extend the trellis diagram. Since the number of states is 2^{km}, the complexity of the Viterbi diagram increases exponentially with *k* and *m*. This imposes practical limits on the length of the registers and in many applications, *k*=1 and *m*<=7.

In contrast to the Reed-Solomon decoder, the convolutional decoder does not produce any indication that the decoded words may be wrong when the frequency of errors surpasses the error-correcting capability of the code. The convolutional code should be used in conjunction with another code for error detection or correction.

### C++ Implementation

The C++ template class *ConvolutionalCodec* implements both the encoder and decoder. (Source code, executables, and related files for the C++ implementation are available electronically; see "Availability," page 3.) The template arguments are the code parameters *n*,*k* and the constraint length. In addition, there is a parameter that defines how many *k*-bit symbols the encoder should process simultaneously to maximize efficiency. The meaning of this parameter will be explained in the next section, which will cover the encoder in more detail.

The advantage of using a template class is that the class can be designed to handle different code parameters. But since all code parameters are known at compile time, the compiler can still generate good quality object code for a given set of code parameters. This way, many of the efficiency advantages of hard coding code parameters will be retained. ConvolutionalCodec<n,k,cl,nmsggrpsymbols> is the C++ class template where: *k* is the number of bits input per input symbol; *n* is number of bits per output symbol; *cl* is the constraint length of code; and *nmsggrpsymbols* is the number of symbols in a group that the encoder handles simultaneously. The encoder class has the following member functions:

*ConvolutionalCodec(const ULONG(*&*genMatrix_in)[N][K]);*is the the constructor of the encoder class. It takes a reference to a two-dimensional array of ULONG values. This array describes the connection patterns from shift registers to the XOR gates in exactly the same manner as the generalized connection pattern*g*,0 <=^{(x,t)}*x*<*n*^ 0 <=*t*<*k*. The constructor is responsible for building all lookup tables needed by the encoder and the decoder. Listing Three s an example of how to define this connection pattern array.*void encode(const UBYTE *message, UBYTE *encoded, ULONG len) const;*encodes the*len*bytes at the address pointed to by message. The memory pointed to by*encoded*should be long enough to hold the encoded message. The function*getEncodedLength*can be used to determine how many bytes will be needed.*inline ULONG getEncodedLength(ULONG len) const;*returns the number of bytes needed to hold the encoded byte sequence of a message that is*len*bytes long. The required space depends mainly on the rate of the code, defined by the code parameters*k*and*n*.*void decode(const UBYTE *encoded,UBYTE *decoded, ULONG len) const;*decodes the byte sequence at the address pointed to by*encoded*using the Viterbi algorithm. The parameter*len*is actually the length of the original message in bytes, not that of the encoded sequence. The memory pointed to by*decoded*should be at least*len*bytes long.*ULONG getMinFreeDistance() const;*returns the minimum free distance (equal to the weight of the lowest-weight nonzero path) of the code.

To extract message symbols and to write encoded symbols of any length, both the encoder and the decoder make of heavy use of bitwise operators to access memory. Since generalized memory-access functions would be significantly less efficient, the implementation I present here assumes an architecture which has Little-endian byte ordering (which means that the least-significant byte of a value is stored at the lowest address) and has the ability to write and read ULONG (unsigned long) values to and from any memory address.

### The Encoder

The constructor of the class initializes several lookup tables, which enable efficient encoding. The first one is *transitionTable*, a two-dimensional array. It defines the encoder output symbol for a given state and a given input symbol. In terms of the symbols defined in Figure 6, *transitionTable[a,s]=Y(s,NS(a,s)).* For the encoder of Figure 1, *transitionTable[1 _{2},101_{2}]=01_{2}*.

The next table, *encodingTable*, is an extension of the transition table. It defines the encoder output for a given state and a group of input symbols. The template argument *nmsggrpsymbols* defines the number of *k*-bit symbols per group. The encoder becomes very efficient when *nmsggrpsymbols* is chosen so that it handles an entire byte (8 bits) of input at a time. For *k*=1, this is the case when *nmsggrpsymbols*=8 and for *k*=2, *nmsggrpsymbols* should be 4.

Listing One implements the *encode *member function.

### The Decoder

Like the encoder, the decoder relies on the *transitionTable* array. In addition, it uses the array *UBYTEWeight,* which maps *UBYTE (unsigned char)* values to their weights. For instance, *UBYTEWeight[0xF3]=6*. This array is used in determining the distance between message symbols; *UBYTEWeight[a^b]* is the Hamming distance between symbols *a* and *b*.

Listing Two is the decoder. The comments in the listing show the mapping of C++ statements to the algorithm steps outlined in Figure 6.

### Instantiating and Using the Template Class

Listing Three is an example that shows how to use this template class to create a codec for the convolutional code of Figure 2.

### Conclusion

Convolutional codes are often superior to block codes when additional information about the reliability of the received signals is available. This is usually the case for analog signals, but not for optical disks. In practice, the minimum free distance and (hence the error-correcting capability) of the codes are limited by the fact that the computational complexity of the Viterbi decoding algorithm grows exponentially with the length of the shift registers. Stephen Wicker gives an excellent explanation of how convolutional codes can be used to combine coding and signal modulation in his book *Error Control Systems for Digital Communication and Storage* (Prentice Hall, 1995).

C++ template classes turned out to be very useful in implementing the convolutional encoder and decoder. They offer the flexibility to use different code parameters and, because these parameters are known at compile time, this flexibility is achieved without sacrificing efficiency. The extensive use of lookup tables further enhances the speed of the encoder and the decoder.

#### Listing One

template<int N_IN, int K_IN, int CL_IN, int NMSGGRPSYMBOLS_IN>void ConvolutionalCodec<N_IN, K_IN, CL_IN, NMSGGRPSYMBOLS_IN>::encode( const UBYTE * const message, UBYTE * encoded, ULONG len ) const { ULONG state = 0; int messageBitPos = 0; int encodedBitPos = 0; const int msgbits = len*BITSPERBYTE; const UBYTE *messagePtr = message; </p> memset( encoded, 0, getEncodedLength( len )); </p> // read NMSGGRPBITS at a time but avoid reading past end of message by // ensuring there are at least this many bits remaining while( messageBitPos < msgbits-(NMSGGRPBITS-1) ) { ULONG inbits = getMsgGroupBits( messagePtr, messageBitPos ); ULONG outbits = encodingTable[ inbits ][ state ].bits; if( NSTATEBITS > NMSGGRPBITS ) state = ((state>>NMSGGRPBITS)|(inbits<<(NSTATEBITS-NMSGGRPBITS))); else state = inbits >> (NMSGGRPBITS-NSTATEBITS); writeEncodedGroupBits( encoded, encodedBitPos, outbits ); } // read remaining bits if the preceding loop did not process all the bits // this happens if the number of message bits is not a multiple of the // number of bits per symbol. if( messageBitPos < msgbits ) { ULONG inbits=(*(messagePtr+(messageBitPos >> 3)) >> (messageBitPos&7)) & ((1<<NMSGGRPBITS)-1); ULONG outbits = encodingTable[ inbits ][ state ].bits; if( NSTATEBITS > NMSGGRPBITS ) state = ((state >> NMSGGRPBITS) | (inbits << (NSTATEBITS-NMSGGRPBITS))); else state = inbits >> (NMSGGRPBITS-NSTATEBITS); writeEncodedGroupBits( encoded, encodedBitPos, outbits ); } // return encoder back to all-zero state while( state ) { ULONG outbits = encodingTable[ 0 ][ state ].bits; state >>= NMSGGRPBITS; writeEncodedGroupBits( encoded, encodedBitPos, outbits ); } }

#### Listing Two

template<int N_IN, int K_IN, int CL_IN, int NMSGGRPSYMBOLS_IN>void ConvolutionalCodec<N_IN, K_IN, CL_IN, NMSGGRPSYMBOLS_IN>::decode( const UBYTE *encoded, UBYTE *decoded, ULONG len ) const { ULONG pathDistance[2][ NSTATES ]; ULONG *oldPathDist; ULONG *newPathDist; </p> UBYTE trellis[ TAU ][ NSTATES ]; </p> ULONG trans, state; ULONG minStatePathDist; </p> newPathDist = &pathDistance[0][0]; oldPathDist = &pathDistance[1][0]; </p> // Comments refer to algorithm steps in Figure 6 // Step 1 newPathDist[ 0 ] = 0; for( state = 1; state < NSTATES; state++ ) { newPathDist[state] = ~0; } const ULONG nmsgsymbols = (len*BITSPERBYTE+K-1)/K; const UBYTE *encodedPtr = encoded; int encodedBitPos = 0; UBYTE *decodedPtr = decoded; int decodedBitPos = 0; </p> memset( decoded, 0, len ); // Step 2 for( ULONG t = 0; t<nmsgsymbols+TAU; t++ ) { ULONG trelliscol = t % TAU; UBYTE *trelliscolp = &trellis[ trelliscol ][ 0 ]; </p> ULONG encbits = 0; if(t < nmsgsymbols + M) encbits = getEncodedSymbolBits( encodedPtr, encodedBitPos ); ULONG *tmp = newPathDist; newPathDist = oldPathDist; oldPathDist = tmp; // Step 3 minStatePathDist = ~0; ULONG bestState = ~0; // Step 5-13 for( state = 0; state < NSTATES; state++ ) { ULONG mind = ~0U; UBYTE minprev = ~0; // Steps 7-10 for( ULONG prev = 0; prev < (1<<K); prev++ ) { const ULONG oldState = ((state << K) | prev) & STATEMASK; const ULONG trans = state >> (NSTATEBITS-K); ULONG sd = oldPathDist[oldState]; if( ~sd ) // test whether sd != ~0 { // Step 8 sd += UBYTEWeight[getTransition(trans,oldState)^encbits ]; if( sd < mind ) { mind = sd; minprev = prev; } } } newPathDist[ state ] = mind; trelliscolp[ state ] = minprev; if( mind < minStatePathDist ) { minStatePathDist = mind; bestState = state; } ULONG os = (((state << K) | minprev) & STATEMASK); } // Steps 15-23 if( t >= TAU ) { UBYTE *tp; UBYTE * const tpe = &trellis[trelliscol][0]; // Step 17 state = bestState; // Steps 18-21. Divided over 2 loops because the trellis // is represented in the trellis array from // (t-(TAU-1)) modulo TAU through t modulo TAU. for( tp = tpe; tp >= &trellis[0][0]; tp-=NSTATES ) { state = ((state << K) | tp[ state ]) & STATEMASK; } for( tp = &trellis[TAU-1][0]; tp > tpe; tp-=NSTATES ) { state = ((state << K) | tp[ state ]) & STATEMASK; } UBYTE dec = state >> (NSTATEBITS-K); // Step 23 writeDecodedSymbolBits( decodedPtr, decodedBitPos, dec); } } }

#### Listing Three

static const ULONG figure2 [ 3 ][ 2 ] = { { 3, 0 }, // = 110, 000 where leftmost is least significant bit { 1, 3 }, // = 100, 110 { 0, 7 } // = 000, 111 }; ConvolutionalCodec<3,2,3,1> codec(figure2); char decoded[4]; char *encoded = new char[ codec.getEncodedLength(4) ]; cout << "Minimum free distance: " << codec.getMinFreeDistance() << endl; codec.encode("test", encoded, 4); codec.decode(encoded, decoded, 4); delete[] encoded; </p> </p>

**DDJ**

*Copyright © 1997, Dr. Dobb's Journal*