### Some New Designs

Two of the candidates submitted to the NIST hash competition, Skein and Vortex, include contributions by Intel personnel. Both are among the 40 entries remaining in the competition.

**Skein**

Skein was designed by Mihir Bellare (University of California, San Diego), Jon Callas (PGP Software), Niels Ferguson (Microsoft), Tadayoshi Kohno (University of Washington), Stefan Lucks (Bauhaus University-Mannheim), Bruce Schneier (British Telecom), Doug Whiting (Hi-Fn), and Jesse Walker (Intel Corporation). Skein produces message digests of any length from 1 to 296 bytes. Skein has three major components: a new block cipher named Threefish, a replacement for the cascade construction named Unique Block Iteration (UBI), and an argument system extending Skein's domain of use beyond hashing.

**The First Skein Component: the Threefish Block Cipher**

Threefish is a "tweakable" block cipher [11], which means that a randomizer called a tweak is passed to the cipher with the key and data to encrypt. Skein uses the block offset from the start of the message as the Threefish tweak. The tweak addresses many deficiencies in the cascade construction and represents the major innovation in Skein.

Threefish has three flavors: a 256-bit, a 512-bit, and a 1024-bit block size. The Threefish encryption key is the same size as the block size. The tweak is always 128 bits.

Threefish is a product cipher, meaning it is composed of rounds. Each round is a simple but weak encryption function. Threefish obtains security by piling round upon round: 72 rounds for Threefish-256 and for Threefish-512 and 80 rounds for Threefish-1024. The number of rounds represents a tradeoff between performance and security.

A Threefish round consists of a number of parallel MIX functions followed by a permutation, so that different blocks are mixed for different rounds. The MIX functions are made up of just three instructions -- 64 bit addition, left rotate, and XOR -- to combine two 64-bit words A and B, as depicted in Figure 1.

Threefish-256 splits its input into four 64-bit words, so each round consists of two parallel MIXes: Threefish-512 uses eight words with four parallel MIXes, and Threefish-1024 uses sixteen words with eight parallel MIXes. Figure 2 depicts the Threefish round structure for Threefish-512. The parallel MIXes efficiently exploit the super-scalar properties of modern processors. The rotation constants r were selected by a hill-climbing algorithm that maximized diffusion over randomly selected sets of rotation constants.

Threefish adds a round key every four rounds. Figure 2 depicts one of these additions. The Threefish round keys come from a key schedule inspired by Skipjack's key schedule [12]. Each Threefish round key is the same size as the plaintext data block, and each key depends on all the bits of both the encryption key and the tweak.

**The Second Skein Component: Unique Block Iteration**

Unique Block Iteration (UBI) mode replaces the cascade construction in Skein. UBI consists of four parts. First, UBI uses the Matyas-Meyer-Oseas construction,`(iv, s) → Eiv (s) ⊕ s`

, to build a compression function `c`

out of any block cipher. Second, UBI padding appends enough 0 bits to bring the length of the message being hashed to a multiple of the block size. Third, UBI constructs and passes the tweak to the block cipher. The UBI tweak is composed of two flags and of the message block offset from the beginning of the message in bytes. One of the flags is set on the first block, and the second flag is set for the final block. Finally, UBI computes its output just like the cascade construction, the only difference being the construction of the tweak:

UBI uses the Matyas-Meyer-Oseas construction instead of Davies-Meyer. This converts attacks against a hash function from related key attacks to chosen plaintext attacks against the block cipher: the community understands more about defending against the latter than it does about defending against the former.

**The Third Skein Component: Skein Argument System**

The Skein argument system extends the algorithm beyond normal hashing to application-specific or personalized hashing, message authentication, key derivation, pseudo-random number generation, stream ciphers, and tree (that is, parallelized) hashing.

**Putting it Together in Skein**

Skein instantiates UBI mode with the Threefish block cipher. The design's initialization vector is computed as the UBI-Threefish output of the configuration string "SHA-3." Skein first hashes a string `s`

under UBI mode and `iv`

to obtain an intermediate value. Skein uses the intermediate value as an `iv`

to hash the integers 0, 1, 2, … under UBI, again to obtain the final output. Classical theory justifies the claim that Skein-n (n = 256, 512, or 1024) achieves `n/2`

bits of security against collisions, and `n`

-1 bits of security against 1st and 2nd pre-image attacks — the best that can be achieved, theoretically. The double hashing under UBI mode also allows Skein to make additional, unusually strong claims, as follows:

- If Threefish is a pseudo-random permutation, then Skein can be used as a pseudo-random function, a secure key derivation function, a secure message authentication code, a secure stream cipher, and a secure pseudo-random number generator.
- If Threefish acts like an ideal cipher, then Skein cannot be differentiated from a random oracle.

The first claim says that Skein can be used naively in a broad range of applications that usually require great sophistication when constructed from classical hash functions. The second claim is a non-trivial result: it claims that Skein is structurally sound when Threefish is viewed as a black box; that is, the attacker is not allowed to utilize any knowledge about the internals of Threefish. This structural property means the security of Skein depends on the security of the underlying block cipher only. The best known attack against Threefish at this time breaks a 34-round variant (out of 72 rounds for full Threefish), which is superior to AES, whose 8-out of 10-round variant falls to attack.

In software Skein is one of the fastest unbroken algorithms ever devised: it runs at 6.1 clocks/byte on an Intel Core Duo processor and requires no special hardware acceleration, such as an AES round instruction. This is twice as fast as the best software implementations of the current hashing standard. Skein also maintains a very small footprint for its in-memory state, allowing implementation in even constrained environments such as smart cards.

### Vortex

Vortex is a family of hash functions developed by Michael Kounavis and Shay Gueron of Intel. A main strength of the Vortex design is that this hash function can achieve an ideal performance of 2.2-2.5 cycles per byte by using the AES round [14] and carry-less multiply instructions [15]. Such instructions have been announced for future Intel processors. Vortex is one of the fastest collision-resistant hashes known when running on future IA processors, outperforming SHA-1 (approx. 7 cycles/byte) by 3.18X, and outperforming SHA256 (approx. 19 cycles/byte) by 8.63X.

The Vortex family produces message digests of 224, 256, 384, and 512 bits, respectively. The main idea behind Vortex is to use well-known algorithms with very fast diffusion in a small number of steps. These algorithms also balance the cryptographic strength, that comes from iterating block cipher rounds with S-box substitution and diffusion, against the need to have a lightweight implementation with as small a number of rounds as possible. Vortex is built upon the following algorithms:

- The Rijndael round function, which performs very fast mixing across 32 bits, as a standalone operation, and 128 bits or 256 bits, if combined with at least one more round.
- A variant of Galois Field multiplication that mixes bits of different sets in a manner that is cryptographically stronger than many other simpler schemes.

Vortex uses a variable number of Rijndael rounds with a stronger key schedule. The number of rounds is a tunable parameter. Rijndael rounds are followed by a variant of Galois Field multiplication to cross-mix between 128-bit or 256-bit sets. This transformation is not simple carry-less multiplication; rather, it combines bit reordering operations, XORs, and additions with carries. In this way, this variant of Galois Field multiplication achieves better diffusion than the straightforward carry-less multiplication between the 128-bit or 256-bit inputs; it is also a non-commutative operation, protecting against chaining variable swapping attacks.

Vortex uses the Enveloped Merkle-Damgård (EMD) construction to lift collision resistance, pre-image and 2nd pre-image resistance, pseudo-random oracle preservation, and pseudo-random function preservation from the underlying compression function to the hash function. To achieve its properties, the EMD construction first hashes the input string s under one initialization vector to get an intermediate value, and then it hashes the intermediate value under a second initialization vector to obtain a final result.

For Vortex-256, Gueron and Kounavis demonstrate that the number of queries required to find a collision with a probability greater or equal to 0.5 is at least 1.18 * 2122.55 [13].

In summary, the Vortex compression function uses the Rijndael round function. Vortex-224 and Vortex-256 use Rijndael-128 rounds. Vortex-384 and Vortex-512 use Rijndael-256 rounds. AES uses the Rijndael-128 round function. For the remainder of this section, ÃK (X) denotes a block cipher based on the Rijndael round function that encrypts `X`

by using key `K`

. `V`

is a multiplication-based merging function.
_{M}^{(A)}(A, B)

The Vortex-block algorithm is the Vortex compression function. This algorithm incorporates two repetitions of an algorithm called Vortex-sub-block. The first repetition of Vortex-sub-block accepts as input the chaining variable `A`

and two least-significant input block words _{i} || B_{i}`W`

of the message being hashed. It returns an intermediate value for the chaining variable _{4i}, W_{4i+1}`A || B`

. The second repetition of Vortex-sub-block accepts as input the intermediate value of the chaining variable `A || B`

and two most-significant input block words `W`

. It returns an update on the chaining variable _{4i+2}, W_{4i+3}`A`

.
_{i+1} || B_{i+1}

With the exception of the last sub-block (discussed later), the algorithm for processing a Vortex-sub-block is as follows:

Vortex sub-block (A, B, W_{0}, W_{1})

The structure of the Vortex sub-block is shown in Figure 3. There are four instances of the transformation `Ã`

in the Vortex sub-block. Each instance is wrapped by using a feed-forward provided by the Matyas-Meyer-Oseas construction to make the transformation non-reversible. The first two instances process input word _{K} (x)`W`

. The other two instances process the input word _{0}`W`

. _{1}`W`

is the least-significant word of the current sub-block to be processed. Instances of _{0}`Ã`

that accept the same input word process a different variable from among A, B. Each instance treats its input variable A or B as a key and treats its input word, which is one from _{K} (x)`W`

or _{0}`W`

as plaintext, as that is the norm in a Matyas-Meyer-Oseas construction.
_{1}

The Vortex merging function `V`

operates as follows:
_{M}^{(A)}(A, B)

where is ordinary 64-bit addition, and ⊕ denotes carry-less multiplication.

The Vortex merging function (as shown in Figure 4) ensures that the bits of `A`

impact the bits of `B`

and vice versa. In fact, each bit of one variable affects a significant number of the bits of the other variable in a non-linear manner. This makes the design better than a straightforward XOR or other simple mathematical operation.Carry-less multiplication is the default configuration of Vortex. The reason why Vortex uses carry-less multiplication by default is because it is easier to make analytical assertions about the collision resistance and pre-image resistance of the hash. In another configuration, Vortex uses integer multiplication. An integer multiplier increases the performance of the hash (not all processor architectures have a carry-less multiplier) and also increases the non-linearity of merging; however, it also makes the security of the scheme more difficult to prove.

The last Vortex sub-block is different. It repeats the sequence of Matyas-Meyer-Oseas transforms and merging several times. The total number of times every bit is diffused over all bits of the hash is determined by the number of sequences of Rijndael rounds and merging found in the last Vortex sub-block; this is another tunable parameter of the hash.