## When traditional algorithms just aren't fast enough

*
Niels is an independent cryptography consultant and Bruce is the founder and CTO of Counterpane Internet Security. They are the authors of Practical Cryptography (John Wiley & Sons, 2003) and can be contacted at [email protected] and [email protected], respectively.*

When securing data in transit—e-mail, VPN sessions, remote logins, and the like—you have to worry about two different types of attacks: someone eavesdropping on the communications, or someone modifying the contents of the communications on the fly. Protecting against the first type of attack requires encryption, and protecting against the second requires authentication.

Clearly, the two attacks are different, as are their defenses. The easiest way to encrypt a message is to apply a symmetric encryption algorithm to the data, rendering it unreadable to someone without the key. The easiest way to authenticate a message is to apply a Message Authentication Code (MAC) to the data, then append that code to the data. Only someone with the key is able to compute the correct MAC code, so recipients can ensure that the data they receive is correct by verifying the MAC code. (Encryption and authentication can also be done with public-key cryptography, but for performance reasons, public-key systems are used only to set up the symmetric keys used for both encryption and authentication.) Both are important. An authenticated message is still in plaintext, and attackers can still read its contents. That's obvious. What's less obvious is the reverse: A message that is encrypted is not automatically authenticated; attackers can imperceptibly change the contents of the message even though they cannot read its contents. The harm that such changes can do is often much greater than the harm caused by revealing the contents.

Imagine a situation where Alice and Bob are using a secure communications channel to exchange data. Consider how much damage eavesdropper Eve could do if she could read all the traffic. Then think about how much damage Eve could do if she could modify the data being exchanged. In most situations, modifying data is a devastating attack and does far more damage than merely reading it. Another example might involve a storage-area network over IP within a corporate LAN. Eavesdropping on traffic is passive, and doesn't necessarily expose private data (particularly on a switched network). But a lack of authentication allows sector-level data tampering that is not possible with direct-attached storage. Adding authentication avoids that problem entirely. Or consider your own personal computer. Because data isn't authenticated, you are more likely to be the victim of viruses, Trojans, and malware. Encryption is important, but authentication is more important. If your computer is controlled by someone on the other end of a Trojan, it doesn't really matter what kind of encryption you've implemented.

Since we often need to both encrypt and authenticate, it makes sense to look at the two operations together. The Advanced Encryption Standard (AES) process gave us a fast symmetric algorithm suitable for message encryption. MACs are commonly built from either block ciphers or hash functions and achieve similar speeds (see Table 1). But taken together, encryption/authentication is not nearly as fast as we'd like. Helix is our attempt to fix that. Helix is both an encryption algorithm and a MAC. The basic idea is that you perform one algorithm and get both functions—fast. It can process data in less than seven clock cycles per byte on a Pentium II CPU, which is more than twice as fast as AES.

### Helix Functionality

To encrypt a message with Helix, you need:

- A key, which can be any string of bytes up to 32 bytes long. The key is the secret value that should be known only to the sender and the receiver. Several messages can be encrypted and authenticated with one key.
- A
*nonce*, which is a 16-byte value that must be unique for each message. You should never use the same*nonce*value twice with a single key. The*nonce*can be a public value; it does not have to be secret. It ensures that every message is encrypted in a different way. The*nonce*is typically derived from the message sequence number or some other counter. - The plaintext, which is the actual message to be sent. It can be any string of bytes up to 2
^{64}-1 bytes long. This is long enough, we believe, for most anything.

Helix produces two values:

- Ciphertext: the encrypted message; it is exactly the same length as the plaintext.
- A 16-byte tag: the MAC value that provides the authentication.

To decrypt a message, you provide the key, *nonce*, ciphertext, and tag to the decryption algorithm. It returns either the plaintext or an error indicating that the authentication has failed. A failed authentication tells you that the ciphertext (or tag, or key, or *nonce*) has been modified after encryption.

Most of this is pretty basic cryptography—stuff that can be found in any textbook (although we're happy to recommend our book *Practical Cryptography,* John Wiley & Sons, 2003). The only nonintuitive part of the Helix functionality is the *nonce*. The *nonce* has a real security function. If you encrypt the same message twice, you use different *nonce* values, so the the ciphertexts are different. Eavesdroppers are not able to recognize that the two messages are identical. (The *nonce* has several other security functions as well, so don't be tempted to use the same value twice just because you don't think this particular type of attack is important.)

Helix could, of course, generate its own *nonce* and add it to the ciphertext. However, most systems already have a message number used to prevent replay attacks or something similar. Using the existing message number to generate the *nonce* is more efficient and allows more flexibility. For example, an application might send only a 32-bit message number instead of the full 128-bit *nonce*. And if you want to use the same key to send messages in both directions, you can agree that one party should use *nonce*s that start with 0x00 and the other party should use *nonce*s that start with 0x01.

### The Helix Block Function

Helix is built around the block function in Figure 1. All values are 32-bit words. The only operations used are 32-bit addition (shown as ), exclusive or (), and bitwise rotation (<<<). One block function is all that is needed to encrypt (or decrypt) 4 bytes of the message. As you can see, there are 12 additions, 11 XORs, and 20 rotations. At first glance, it would take 43 clock cycles to compute the block function, but there is a great deal of parallelism, so modern CPUs can use their superscalar capabilities to execute two or three operations in a single cycle.

Blocks are numbered within a single encryption; block number *i* encrypts plaintext word *P*_{i}*. *At the top of block *i* is the current state *Z** ^{(i)}*, which consists of five 32-bit words. The first word of the state is used as a key stream for the encryption. (The plaintext word is XORed with the key stream to produce the ciphertext word—or the other way around for decryption.) The rest of the block function mixes the state with the plaintext word and two words derived from the key and

*nonce*. The final output is

*Z*

^{(}

^{i}^{+1)}, which is the state used to process the next word of the message. The size of the state has also been chosen carefully. Intel CPUs have seven registers available. Five are used for the state, which leaves one as a pointer to the message buffer, and one as a working register to handle the key stream, plaintext, and key words. Almost all operations can be done in the registers, thus reducing the number of memory accesses required. The cryptographic strength of Helix derives from the mixing of addition, exclusive or, and rotation. Using any two of these operations would lead to a function that is easy to analyze and attack; but combined, they effectively resist mathematical analysis.

### The Key Words

The key words *X*_{i,}_{0} and *X*_{i,}_{1}, used by block *i*, are derived from the key and the *nonce*. The words *X*_{i,}_{0} cycle through eight different words of a 256-bit working key. The words *X*_{i,}_{1}_{ }are more complicated, and depend on the working key, the *nonce*, and a few other parameters that prevent specific types of attack.

In each block, Helix produces one word of a key stream. We have to assume that the key stream is revealed to attackers. (If attackers know both the plaintext and ciphertext, they can compute the key stream.) So they learn 32 bits of the 160-bit state. Now imagine they try to use that knowledge for an attack. They know 32 bits of the state at the start of a block, but during the block itself, we add two whole words of key material (which they don't know) to the state. Thus, the attacker has no knowledge about the state at the end of the block.

### Authentication

So where is the authentication? One of the elegant solutions in Helix is that the authentication is done at the same time as the encryption. In each block, plaintext is used as one of the inputs. So while the message is being encrypted, the block function mixes the key, *nonce*, and plaintext all into a single 160-bit state. The authentication is derived from this final state. Basically, we apply another eight block functions to thoroughly mix in the last plaintext word, and then extract the next four key-stream words as the tag.

The receiver performs the exact same calculation, and verifies that the tag it received matches the tag it computed.

### Example Code

Listing One is a complete Python implementation of Helix. This implementation was written for clarity rather than speed.

At the start, we define some helpful functions. The function *_rol32 *implements a 32-bit rotation, *_strToWords* converts a string that contains a sequence of bytes to a list of 32-bit words using the *LSByte* first convention, and *_wordsToString* implements the reverse conversion. The first function of the Helix class is *helixBlock*, which applies the Helix block function to the state *Z.* This is a straightforward implementation of Figure 1.

The *__init__ *function is called every time a new Helix object is created. It receives the key (encoded in a Python string) as the argument. The key is first padded with zeroes until it is 32 bytes long, then converted to eight 32-bit words. These words are then thoroughly mixed using the block function. The mixing ensures that all eight words of the key that we will be using are random, even if the user supplied a shorter key. The *doBlk* function is a wrapper around *helixBlock* that computes the two key words that go into the block. It makes the rest of the code easier to read.

The start function takes the *nonce* as an argument and initializes a Helix encryption or decryption. First, the *nonce* is used to create an array *X1* that will (by *doBlk*) compute the *X*_{i,}_{1} values. Finally, the state is initialized and eight blocks are run without processing any plaintext. This premixing ensures that the *nonce* and key are thoroughly mixed before we start processing the message and revealing the key stream to attackers. The *finish* function performs the tasks at the end of an encryption or decryption. It runs eight more blocks, then produces the 16-byte tag. With these helper functions, the encrypt function is relatively easy. The only problem is that the plaintext might not be a whole number of words, so extra effort is required to handle arbitrary plaintext length.

The *decrypt* function is similar to the *encrypt* function but with one major difference related to the very last word of the decryption. If the plaintext is 10 bytes long, then only 2 bytes of the last word are used. The other 2 bytes are defined to be zero for purposes of computing the tag. During the decryption, the computation that gives the plaintext word does not fill the 2 extra bytes with zeroes, so the decryption routine has to mask out between zero and 3 bytes of the last plaintext word to get the right value to use in the computation of the tag. Listing One is simple, and computes a suitable mask for every word.

### Conclusion—and a Warning

Helix was created to meet an important requirement—encryption plus authentication. Again and again, we see protocols designed by otherwise-intelligent committees that mandate encryption but not authentication; WEP and Bluetooth are two such examples that come to mind. Likewise, an early version of the IPsec Standard had a mode that encrypted but did not authenticate.

Last year, Bruce had a conversation with an engineer involved with security for the Bluetooth wireless protocol. Bruce told him that Bluetooth provides only privacy, while it should also provide authentication. The engineer's response was that pseudorandom frequency hopping makes it "nearly impossible" for attackers to get in, and since the range is only eight feet, the attacks are naturally limited. Bruce responded that he could "hardly wait for Bluetooth to become universal, because I really want a wireless keyboard and mouse with the 'base station' built into my computer." The engineer said: "Yes, but you really probably don't want to use Bluetooth for that, because then somebody could stuff keystrokes or mouse clicks into your system." Bruce didn't know whether to laugh or cry. The bottom line is that authentication is vitally important, and often more important than encryption. We wrote Helix to provide both.

Helix is a new design that uses a new structure. (For more details and code examples, see http://www.macfergus.com/helix/ and "Helix: Fast Encryption and Authentication in a Single Cryptographic Primitive," by Niels Ferguson, Doug Whiting, Bruce Schneier, John Kelsey, Stefan Lucks, and Tadayoshi Kohno, to appear in *In Fast Software Encryption 2003: Lecture Notes in Computer Science,* Springer-Verlag, 2003.) We made Helix as fast as we possibly could, which means that it is fairly close to the edge where attacks might be possible. In other words, this is a high-risk design. It is conceivable that we'll see an effective attack against it in the next few years. Consequently, we advise against using Helix for serious applications. If you can afford the slower performance, use one of the traditional algorithms. Only if you need something this fast should you consider Helix. Of course, we may offer different advice once people have taken a good whack at the algorithm.

### Acknowledgments

Helix was designed by Niels Ferguson, Doug Whiting, Bruce Schneier, John Kelsey, Stefan Lucks, and Tadayoshi Kohno.

**DDJ**

#### Listing One

#struct provides conversion between strings and a list of words import struct #Some helper values and functions _mask32 = 0xffffffffL def _rol32( V, n ): #Rotate a 32-bit value V left by n positions return _mask32 & ( (V<<n) | ((V & _mask32) >> (32-n) ) ) def _strToWords( s ): #Interpret string as list of 32-bit integers, LSByte first. return list(struct.unpack( '<' + 'L' * (len(s)/4), s )) def _wordsToStr( w ): #Convert list of 32-bit integers to a string, LSByte first. return apply( struct.pack, [ '<' + 'L' * len(w)] + w ) class Helix: def helixBlock( self, X0, P, X1 ): #Apply the Helix block function to self.Z using #X0, P, and X1 as the three input words. Z = self.Z Z[0] += Z[3]; Z[3] = _rol32( Z[3], 15 ) Z[1] += Z[4]; Z[4] = _rol32( Z[4], 25 ) Z[2] ^= Z[0]; Z[0] = _rol32( Z[0], 9 ) Z[3] ^= Z[1]; Z[1] = _rol32( Z[1], 10 ) Z[4] += Z[2]; Z[2] = _rol32( Z[2], 17 ) Z[0] ^= (Z[3] + X0); Z[3] = _rol32( Z[3], 30 ) Z[1] ^= Z[4]; Z[4] = _rol32( Z[4], 13 ) Z[2] += Z[0]; Z[0] = _rol32( Z[0], 20 ) Z[3] += Z[1]; Z[1] = _rol32( Z[1], 11 ) Z[4] ^= Z[2]; Z[2] = _rol32( Z[2], 5 ) Z[0] += (Z[3] ^ P); Z[3] = _rol32( Z[3], 15 ) Z[1] += Z[4]; Z[4] = _rol32( Z[4], 25 ) Z[2] ^= Z[0]; Z[0] = _rol32( Z[0], 9 ) Z[3] ^= Z[1]; Z[1] = _rol32( Z[1], 10 ) Z[4] += Z[2]; Z[2] = _rol32( Z[2], 17 ) Z[0] ^= (Z[3] + X1); Z[3] = _rol32( Z[3], 30 ) Z[1] ^= Z[4]; Z[4] = _rol32( Z[4], 13 ) Z[2] += Z[0]; Z[0] = _rol32( Z[0], 20 ) Z[3] += Z[1]; Z[1] = _rol32( Z[1], 11 ) Z[4] ^= Z[2]; Z[2] = _rol32( Z[2], 5 ) def __init__( self, key ): #Initialise new Helix object with a key. #The key is a string with length <= 32. self.lK = len( key ) assert 0 <= self.lK <= 32 #Convert to list of 8 words self.K = _strToWords( key + '\000'*(32-self.lK) ) #Perform the key mixing for i in range( 8 ): self.Z = self.K[:4] + [self.lK + 64] self.helixBlock( 0, 0, 0 ) self.K = [self.K[4]^self.Z[0], self.K[5]^self.Z[1], self.K[6]^self.Z[2], self.K[7]^self.Z[3]] + self.K[:4] def doBlk( self, P ): #Perform single Helix block operation with plaintext word P #computing X_i,0 and X_i,1 appropriately. #Get i mod 8 i = self.i8 % 8 #Compute X1 from self.X1 and (i+8) X1 = self.X1[ i ] if i%4 == 3: X1 += (self.i8)>>31 X1 = (X1 + self.i8) & _mask32 #The actual block function self.helixBlock( self.K[i], P, X1 ) #Update the block number self.i8 += 1 def start( self, N ): #Initialise a Helix encryption or decryption with nonce N. #The nonce is a string of length 16. assert len(N) == 16 Nl = _strToWords( N ) #Extend the nonce to 8 words for i in range( 4 ): Nl.append( (i - Nl[i]) & _mask32 ) K = self.K #Make array of X1 values to generate X_{i,1} efficiently. self.X1 = [] for i in range( 8 ): x = ((i%4)==1)*4*self.lK self.X1.append( (K[(i+4)%8] + Nl[i] + x) & _mask32 ) # Initialise state and run first 8 blocks for nonce mixing. self.Z = [K[3]^Nl[0], K[4]^Nl[1], K[5]^Nl[2], K[6]^Nl[3], K[7] ] self.i8 = 0 for i in range( 8 ): self.doBlk( 0 ) def finish( self, lnm4 ): #Finish up Helix processing and return the Tag. #Tweak the internal state as specified self.Z[0] ^= 0x912d94f1L #Apply 8 block functions with len(P) mod 4 as plaintext for i in range( 8 ): self.doBlk( lnm4 ) #And generate the tag tag = [] for i in range( 4 ): tag.append( self.Z[0] ) self.doBlk( lnm4 ) #Cleanup. self.Z = None self.X1 = None self.i8 = None #return the tag as a string of bytes return _wordsToStr( tag ) def encrypt( self, N, P ): """ Encrypt the plaintext P with the nonce N. Returns a tuple (C,T) containing the ciphertext C and the authentication tag T. """ #Initialise a Helix encryption self.start( N ) ln = len(P) #Pad plaintext with zeroes to make a whole number of words P = P + '\000' * ( (4-(ln%4)) % 4 ) #... and convert P to a list of 32-bit words. Pl = _strToWords( P ) #Cl will contain the ciphertext, encoded as a list of words. Cl = [] #Main encryption loop for p in Pl: Cl.append( p ^ self.Z[0] ) self.doBlk( p ) #Convert ciphertext to string, and trim to proper length C = _wordsToStr( Cl )[:ln] #Compute the authentication tag (requires len(P) mod 4). T = self.finish( ln%4 ) return (C,T) def decrypt( self, N, C, T ): """ Decrypt the plaintext C with the nonce N, and check that the authentication tag T is correct. Returns None if the authentication failed; returns the plaintext P if the authentication was correct. """ #Initialise Helix decryption with the nonce. self.start( N ) #Compute the ciphertext words and the mask words. The last plaintext word #needs to be masked to ensure that unused plaintext bytes are set to zero #for the tag computation. We mask every word, which is slightly easier #to implement though it is less efficient. ln = len( C ) #Pad C and compute mask string (in bytes) pad = '\000' * ( (4-(ln%4)) % 4 ) C = C + pad M = '\xff' * ln + pad #Convert to words Cl = _strToWords( C ) Ml = _strToWords( M ) #Pl will contain the plaintext as a list of words. Pl = [] #Main decryption loop for i in range( len( Cl ) ): #Decrypt a word, and apply the mask p = (self.Z[0] ^ Cl[i]) & Ml[i] Pl.append( p ) self.doBlk( p ) #Convert plaintext to bytes, and trim to correct length. P = _wordsToStr( Pl )[:ln] #Compute the value we expect the authentication tag to have. V = self.finish( ln%4 ) #Check for the correct tag value if V == T: return P else: #The tag was incorrect. return None