Firstly, instead of avoiding characters CR, LF, '?' and '\0', I will avoid the 8-bit values 0x00, 0x01, 0x02 and 0x03. By a simple post process I interchange the actual illegal values with the codes I avoided. I must now avoid any code of the form 0b000000xx. To explain the method I will first examine a similar, more restricted case. Assume I am to transfer nibbles (0x1, 0x2, . . .0xF), with 0x0 an illegal value. I also assume I am to transmit 31 bits in eight nibbles. This is how it is done....
A Test Case: Transmitting Nibbles
1. I first distribute the 31 bits of payload,[x], into the eight nibbles I will transmit. The least significant bit of the first nibble is set to zero.
I now have exactly one of the following three situations:
- All nibbles are different from 0x0, so all nibbles are legal. There is no recoding to perform, encoding is complete! The decoder know no recoding was performed because the least significant bit of the first nibble is zero.
- The first nibble is legal but at least one nibble is illegal, go to 3.
- The first nibble is illegal, go to 2.
2. I want to establish a linked list, which is ended with the last node pointing to itself. The nodes will replace all illegal values. I set the least significant bit of n to 1, and enter the index to the next node in the upper three bits. Because all illegal values are replaced with nodes, and all nodes necessarily are different from 0x0 (because the least significant bit is set to one), the illegal values are replaced with bit patterns that are legal; see Table 1.
3. I want to establish a linked list, as in point 2. The original value of n must be conserved, and this is done by entering the value of n as the tail node of the linked list. In decoding, this is detected because the linked list ends in a node where the least significant bit is zero; see Table 2.
This example uses nibbles only. Let me assume that you are to use a channel that transports bytes, and that byte values 0x00 to 0x0F are illegal. In that case you could say that the upper nibble of the byte must be different from 0, and you can use the same method as above. The lower nibble of every byte can stand unmodified in the original position. This is nearly the scenario outlined in the introduction, where I had four illegal values instead of 16.
Linked List for the RS232 Scenario
For my original case, I let the lower two bits in each byte stands unmodified. I have 6 bits usable for nodes in a linked list. 5 bits are usable for an index to the next node, so the maximum block length is 32 bytes. I am able to transport 255 bits in 32 bytes, giving a relative bandwidth utilisation of 255/256 = 0.996, contrasted to the theoretical maximum of log2(25232)/256 = 255.27/256 = 0.997.
Linked List for SD-SDI
By a simple substitution, I want to avoid the eight 10-bit codes 0b0000000xxx. I have 7 bits available for construction of nodes, so 6 bits are usable for an index to the next node. A maximum block length of 64 10-bit words results. I am able to transport 639 binary bits in 64 SD-SDI samples, giving a relative bandwidth utilization of 639/640 = 0.9984, where the maximum relative bandwidth is (64*log2(1024-8))/640 = 639.28/640 = 0.9989. An usual way to encode binary data in SD-SDI give a relative bandwidth of 9 10 = 0.9. The linked list method increases the bandwidth with ≈ 10 percent.
Limitations, Variants, Efficiency
The attractiveness of the method is that it is simple to implement, does not require much hardware resources, and give a result close to the theoretical maximum for my applications. The method does not work well when many codes in an alphabet is illegal, because then the block length gets smaller.
Some variants of the method are possible. In the case a block must be recoded, we could get a larger block by using a relative index. The downside will be higher latencies in encoding and decoding, and a data dependent block length. For me it was more important to have a fixed bound on the buffer sizes.
For an FPGA implementation, the latency of encoding and decoding is important. The latency of encoding is neccesarely high, the whole block being considered for recoding must be examined before the first element is transmitted. It is possible to reduce the decoding latency to near the half of a block length. Once one has reached halfway through the block, one need one bit less in the index to a node, and this can be used to flag if the original first element of the block was valid or invalid. For me, the much increased complexity of that variant was not worth the effort.
Finally, let me assess how much bandwidth is wasted, in absolute terms, for my RS232 application, which running at 115200 bps transmits 8/10 * 115200 = 92160 raw bits/second. The maximum theoretical bandwidth is 8/10 * 115200 * (log2 256-4)/8 ≈ 91898 bits/second. Using the linked list method I find a transmit rate 8/10 * 115200 * 255/256 = 91800 bits/second. The difference is not even 13 bytes per second.