Baard Nossum is R&D Manager at Network Electronics AS. He can be reached at [email protected]
Several times I have needed to transport binary data through a channel where certain symbols are illegal. In one instance I had to make use of a RS232 channel to download a configuration file to an FPGA (via a microcontroller). A proprietary protocol layer on the RS232 channel made it imperative to avoid the characters CR, LF, '?' and '\0'. In another application I had to transport binary data in digital serial video (SD-SDI). SD-SDI is a stream of 10-bit video samples, but with in-line control so the binary codes 0b00000000xx and 0b11111111xx are illegal.
When devising a recoding scheme, I want a simple solution. For many years I used the IHEX format when downloading a configuration file. IHEX is straightforward, robust, well-understood, but inefficient. Each byte to transfer is coded as two hexadecimal numbers, giving a usable relative bandwidth of 0.5 (actually a lot less, IHEX has a considerable overhead). The maximum usable relative bandwidth of my RS232 channel (where four symbols were to be avoided) was log2(256 - 4) ≈ 7.98 per 8 transported bits, hence 7.98/8 = 0.997. A simple downloading solution is a good thing, but when the consequences is a two-fold increase in download time, I ran into problems. I decided to do better.
The ultimate recoding would be radix-conversion. In theory I could then reach the maximum efficiency. Radix-conversion is computationally intensive, requiring repeated divisions or multiplications. An interesting application is called Ascii85, which is used by Adobe (Ascii85 essentially encodes a 32-bit integer in 40 bits). I wanted to avoid multiplications in my algorithm, and examined other methods. It is easy to map 7-bit symbols into 8-bit symbols; the efficiency would increase to 7/8 = 0.875. While considerably more efficient than IHEX, I found I could do even better. In fact, I came very close to the theoretical maximum relative bandwidth, and this method I want to share with you.