Hash functions are one of cryptography's most fundamental building blocks, even more so than encryption functions. For example, hash functions are used for digital fingerprinting and commitment schemes, such as message authentication and random number generation, as well as for digital signature schemes, stream ciphers, and random oracles.

Recently, Antoine Joux, one of the leading cryptographers of our time, discovered multi-collision attacks against the general framework in which hash functions are constructed [1], and Xiaoyun Wang created an attack that breaks the collision-resistance property of the most widely deployed hash functions [2], including MD5 and SHA-1. In 2005, Arjen Lenstra and Wang demonstrated how to forge two digital certificates, based on the MD5 hash function, by using different keys but the same signature -- something that was hitherto thought to be impossible [3]. These attacks that were discovered, and other vulnerabilities that were exposed by researchers have stimulated a resurgence of research into hash functions. This research has also spawned an international competition, sponsored by the National Institute of Standards and Technology (NIST), an agency responsible for standards used by the U.S. Government, to create a next-generation hash function design.

In this article we highlight some recent work in hash function development. We begin by providing some background on hash functions and look at the problems that hash functions address. We then sketch how to build a hash function. Moving on, we outline recent seminal work in the field of hash functions. We describe two radically different designs entered in the NIST competition, the Skein and Vortex designs, created in part with Intel participation. This is followed by a discussion of the design rationale for each where we also compare and contrast the basic design decisions. We end with a summary of our findings.

### Hash Functions

A hash functions is usually defined as a function `H`

satisfying three properties [4]:

**Collision resistance**. It is computationally infeasible to find two distinct bit strings`s`

≠`s'`

such that`H`

(`s`

) =`H`

(`s'`

).**Pre-image resistance**. Given a hash value`t`

in the range of`H`

, it is computationally infeasible to find a string`s`

for which`H`

(`s`

) =`t`

.**2nd Pre-image resistance**. Given a string`s`

and hash value`t`

such that`H`

(`s`

) =`t`

, it is computationally infeasible to find a second string`s'`

such that`H`

(`s'`

) =`t`

as well.

A hash function maps the set of all bit strings into a "message digest" of defined length, called the hash function's "block size." Since there are many more strings than message digests, at least one digest output by the hash function must be the image of more than one input string. It is therefore remarkable that it is possible to build a function `h`

that has the three properties previously noted. These properties imply that `h`

essentially acts like a randomly selected compression function.

It is instructive to describe some typical use cases:

**Digital fingerprinting**. Hash functions construct effective digital fingerprints. If`d`

represents a digital data structure, such as a document, then the message digest`h (d)`

is its fingerprint; the 2nd-pre-image-resistance property says it is infeasible to find a second data structure`d'`

with the same fingerprint`h (d') = h (d)`

.**Digital signatures**. Digital signatures extend digital fingerprinting by encrypting the hash value`h (d)`

of a data structure d under a private key.**Message authentication**. A hash function h used with a secret key`K`

can be used to authenticate a message m. The idea is to create a tag`t = h (K || m)`

, where "||" denotes string concatenation, which is sent with the message and verified by the receiver. (This does not quite work in practice, because the cascade construction introduces vulnerabilities hashing the last message block. Instead the tag is essentially computed as`h (K || h ( K || m))`

). Because of pre-image resistance, it is infeasible for an attacker to create the same tag`t`

unless he or she knows the key`K`

, and because of collision resistance, the tag could have been created only by concatenating the message m to the key`K`

.`Pseudo-random number generation`

. One standard way to build a random number generator is to take a key`K`

, usually called a "seed", and to compute`h (K || 0 ), h (K || 1 ), h (K || 2 )`

, with each digest representing a different random number. If`h`

and`K`

are carefully selected, it is infeasible to distinguish this, by any statistical test, from a stream of genuine random numbers.**Stream ciphers**. A stream cipher can be built by taking a hash function`h`

, and encrypting message`m`

by_{i}`m`

._{i}⊕ m_{i}⊕ h (K, i ), where " ⊕" denotes XOR, and decryption is mi ⊕ h (K, i ) ⊕ (mi ⊕ h (K, i )) ⊕ h (K, i ) = m_{i}**Random oracles**. Random oracles can be thought of as specialized random number generators. They are used widely in cryptography, such as for randomizing public and private key encryption, to make them secure from arcane attacks.