### Designing Hash Functions

The standard approach to building a hash function is first to construct a compression function that operates on the input strings of a fixed length, and then to use the cascade construction to extend the compression function to strings of arbitrary length [5, 6].

Compression functions are usually built out of block ciphers. Recall that a block cipher is a pair of `D`

and `E`

functions, for decrypting and encrypting, respectively, that operate on strings of a particular length, called the "block size". If the block size is n-bits, then the encryption of an n-bit string s is `E (s)`

, and its decryption is `D (s)`

. Every string can be encrypted or decrypted, and `E(D (s)) = D(E(s)) = s`

, meaning `E`

(and `D`

), is a permutation of the set of all n-bit strings. Cryptographers say a block cipher is secure if both `s → E (s)`

and `s → D (s)`

are indistinguishable from a randomly selected permutation. To meet the indistinguishability property, block ciphers are keyed, so a block cipher represents a family of permutations. A particular block cipher instance is selected by choosing a key `K`

. The resulting encryption and decryption instances are denoted `E`

and _{K}`D`

. That is, for each choice of a key _{K}`K, s → E`

(and _{K} (s)`s → D`

) behave like different randomly selected permutations.
_{K} (s)

### Compression Functions

The compression functions for all the hash functions commonly used today are built in the following way:

- Select a block cipher scheme
`(E, D)`

. - Define a compression function
`c (iv, s) = E s (iv) ⊕ iv`

.

Here `s`

denotes a message of exactly n-bits, and `iv`

denotes an initialization vector. This recipe for `c`

says to use s as the encryption key and `iv`

as the data to be encrypted, and then to use XOR s with the encrypted result `Es ( iv ) `

. The mapping `(iv, s ) → Es ( iv ) ⊕ iv`

is called the Davies-Meyer construction [7] for `E`

. It is easy to show that a block cipher used in Davies-Meyer mode is collision-resistant, pre-image resistant, and 2nd pre-image resistant. `c`

is called a compression function because it compresses `<iv,s> `

into a new string `iv'`

of exactly `s'`

s length. Other compression function constructions also exist: both Vortex and Skein use the Matyas-Meyer-Oseas [8] construction, `c ( iv, s ) → Eiv ( s ) ⊕ s`

, which is identical to Davies-Meyer, except it reverses the role of `iv`

and `s`

.

### The Cascade Construction

The cascade construction builds a hash function `h`

from a compression function `c`

with block size `n`

as follows:

Every hash function based on a block cipher must define a padding scheme, because compression functions only operate on strings s of length `n`

bits exactly. Most padding schemes pad `s`

with a single 1 bit followed by as many 0 bits as are necessary to bring the length to a multiple of `n`

. The length of the unpadded string `s`

is then encoded as an `n`

-bit integer and appended to defend against extension attacks.

Once padded, partition `s`

into `b = |s|/n`

blocks, each consisting of `n`

bits (|`s`

| denotes `s'`

s length in bits): `s`

.
_{1} s_{2} … s_{b} ← s

Finally, beginning with a hash-function-specific initialization vector `iv`

, serially compute `c (iv`

for each block _{i'}s_{i})`s`

.
_{i}

The cascade construction extends the collision resistance, pre-image resistance, and pre-image resistance properties from a compression function to a function operating on strings of arbitrary length [9]. The cascade construction is sometimes called the Merkle-Damgard construction after its inventors. This construction is intrinsically serial, as it needs to be able to detect problems, such as two blocks being exchanged.

### Hashing Today

We just summarized the state of the art during the early part of this decade, prior to two significant publications. The first was by Antoine Joux, who introduced the multi-collision attack. The second was by Xiaoyun Wang, where she described an attack, based on differential cryptanalysis, against all of the hash algorithms broadly used today.

Suppose a hash function is built out of a compression function by using the cascade construction. Also suppose that someone has broken the collision resistance of the hash function; that is, they have discovered two distinct strings `s ≠ s'`

so that `h (s) = h (s')`

. Joux observed that it is easy to find many more collisions for little additional cost [1]. The source of the problem is that the cascade construction maintains too little state as it progresses from one invocation of the compression function to the next. Joux's result says that by itself the cascade construction is too weak to serve as an adequate building block for constructing hash functions.

Wang's attack [2], based on differential cryptanalysis, has a different flavor. Differential cryptanalysis is a technique to analyze block ciphers. Essentially, differential cryptanalysis follows a bit slice through the block cipher being analyzed, to characterize how it gets diffused. The goal of differential cryptanalysis is to identify bits leading to unusually high or low levels of diffusion. When such bits are identified, they can be used to recover bits of the encryption key. This can dramatically shrink the size of the key space, making brute force search realistic. As an example, differential cryptanalysis reduced the cost of key recovery attacks against the DES cipher from 256 encryptions to about 241.

Wang showed that a differential attack could produce collision in message digests, thereby breaking the collision-resistance of the hash function producing them. Wang first demonstrated her attack against MD4, MD5, RIPE-MD, and SHA-0. This was viewed as a stunning result, but then she demonstrated that a collision can be produced in SHA-1 at a cost of about 261 operations. This caused upheaval in the cryptographic community, raising the question as to whether we even understand what a hash function is.

The cryptographic community has vigorously debated hash design principles in the intervening years. The only clear consensus emerging from this debate is that we need a worldwide, focused project whose goal is to create a new generation of hash functions that defend against the new attacks. A lesson previously learned by the community is that contests have great efficacy in galvanizing technical consensus building. In 2007, NIST initiated an international competition to create a new hash standard [10]. Candidate submissions were due on October 31, 2008. 55 algorithms were entered. In February of this year, NIST whittled down the list of candidates to 40, and from this it plans to select 10 to 15 first-round candidates by August 2009. NIST plans to select a set of finalist algorithms in 2010, then to announce the winner(s) in 2011.

NIST is widely influential in the creation of cryptographic standards worldwide, so it is a good sponsor for the competition. One of NIST's most important contributions to cryptography standards has been the creation of requirements for algorithms submitted to the competition. The competition requires that candidate algorithms provide the collision-resistance, pre-image-resistance, and 2nd-pre-image-resistance properties -- and be free of any known intellectual property. Algorithms must support output block sizes of 128, 160, 224, 256, 384, and 512 bits. The rules encourage support for features outside the core properties, especially for parallelization. Submissions must be accompanied by a security rationale, to help establish confidence in the algorithms.