Ham It Up
I was having a conversation this week about gray codes and thought I should write a little something about them. About half way through, I realized it sounded familiar and discovered I had written almost the exact same thing over a year ago (see Shades of Gray). It is a bad sign when you start repeating yourself!
Gray codes always make me think of hamming distance, though, since they are a case where each adjacent code has a hamming distance of 1 from each other. For the case of a mechanical or optical encoder, that's exactly what you want. If you are looking for fault tolerance, then you want more hamming distance, not less.
Consider some networking scheme where I have a one byte address for each node. In theory, then, my network could support 256 nodes (although I probably use a few addresses for special purposes). I probably don't have 256 nodes though. In fact, you probably have way less in a typical project.
Let's take the silly example of having three nodes. You might just as well number them 0, 1, and 2. As long as everything works as you expect, there's nothing wrong with that. What happens if things don't work like you expect though? If some sort of error causes a single bit to flip, then address 0 could become address 1 or 2. Conversely, addresses 1 or 2 could easily become zeros. That's because the hamming distance between 0 and the other two addresses is 1. Of course, you could have errors that generated other addresses (like 1 to 3) but those aren't as important because there is no address 3 and it would be easy to figure out that some error occurred. There's no way to tell a bogus address 2 from a real address 2, so that's the kind of error you worry about.
You can't have a single bit error that causes address 1 to become 2 or vice versa, though. The hamming distance there is 2, so it would take two errors to get an undetected mistake (and, again, most errors would cause a totally illegal address — I'm not worried about those as much because I can detect them).
A smarter move would be to use those extra bits in the address to make sure that errors could not easily corrupt an address into another good address. For example, suppose you numbered the nodes (in hex) 03, 0C, and 30. Now any single bit error would generate an illegal address. There's so much extra room that you could easily create a set of 3 numbers that had an even greater hamming distance (07, 38, and C0, for example). In real life, you probably have more than just 3 nodes in an 8-bit address space, but you can still work it out.
Oddly enough, last week I was writing about bit population count algorithms. I wasn't thinking about hamming distance, but the bit population count is key to computing hamming distance. If you want to know the distance between two bit strings A and B, simply XOR them together and then do a bit population count of the result. The XOR will have a 1 bit for every bit position that is different and the bit population count tells you how many 1 bits you got. Using the code from last week:
unsigned int bit_count(unsigned int v)
{
unsigned int count;
for (count=0; v; count++)
{
v&=v-1;
}
return count;
}
unsigned int hamming_dist(unsigned a, unsigned b)
{
return bit_count(a^b);
}
If you think about it, parity is just a simple case of using an extra bit to generate a hamming distance of two. For example, a 7-bit ASCII character with parity has at least a distance of 2 between each character (because adjacent characters will have different parity bits).
If you are like me and too lazy work out codes yourself, you might find this tool useful. Of course, as with any redundant scheme, you do lose the use of some bits. Using that tool, you can see that you could shoehorn 128 numbers into a byte and maintain a minimum hamming distance of 2. With a hamming distance of 3, the total items you can fit drops to 16! At a hamming distance of 6, you can only hold 2 distinct numbers.
Hamming distance is important in reliable systems and is also used in some cryptography algorithms. Of course, the idea of it is also important for gray codes. It is one of those simple bit tricks everyone should have in their toolkit.

