Hash functions are one of cryptography's most fundamental building blocks, even more so than encryption functions. For example, hash functions are used for digital fingerprinting and commitment schemes, such as message authentication and random number generation, as well as for digital signature schemes, stream ciphers, and random oracles.
Recently, Antoine Joux, one of the leading cryptographers of our time, discovered multi-collision attacks against the general framework in which hash functions are constructed , and Xiaoyun Wang created an attack that breaks the collision-resistance property of the most widely deployed hash functions , including MD5 and SHA-1. In 2005, Arjen Lenstra and Wang demonstrated how to forge two digital certificates, based on the MD5 hash function, by using different keys but the same signature -- something that was hitherto thought to be impossible . These attacks that were discovered, and other vulnerabilities that were exposed by researchers have stimulated a resurgence of research into hash functions. This research has also spawned an international competition, sponsored by the National Institute of Standards and Technology (NIST), an agency responsible for standards used by the U.S. Government, to create a next-generation hash function design.
In this article we highlight some recent work in hash function development. We begin by providing some background on hash functions and look at the problems that hash functions address. We then sketch how to build a hash function. Moving on, we outline recent seminal work in the field of hash functions. We describe two radically different designs entered in the NIST competition, the Skein and Vortex designs, created in part with Intel participation. This is followed by a discussion of the design rationale for each where we also compare and contrast the basic design decisions. We end with a summary of our findings.
A hash functions is usually defined as a function
H satisfying three properties :
- Collision resistance. It is computationally infeasible to find two distinct bit strings
- Pre-image resistance. Given a hash value
tin the range of
H, it is computationally infeasible to find a string
- 2nd Pre-image resistance. Given a string
sand hash value
t, it is computationally infeasible to find a second string
A hash function maps the set of all bit strings into a "message digest" of defined length, called the hash function's "block size." Since there are many more strings than message digests, at least one digest output by the hash function must be the image of more than one input string. It is therefore remarkable that it is possible to build a function
h that has the three properties previously noted. These properties imply that
h essentially acts like a randomly selected compression function.
It is instructive to describe some typical use cases:
- Digital fingerprinting. Hash functions construct effective digital fingerprints. If
drepresents a digital data structure, such as a document, then the message digest
h (d)is its fingerprint; the 2nd-pre-image-resistance property says it is infeasible to find a second data structure
d'with the same fingerprint
h (d') = h (d).
- Digital signatures. Digital signatures extend digital fingerprinting by encrypting the hash value
h (d)of a data structure d under a private key.
- Message authentication. A hash function h used with a secret key
Kcan be used to authenticate a message m. The idea is to create a tag
t = h (K || m), where "||" denotes string concatenation, which is sent with the message and verified by the receiver. (This does not quite work in practice, because the cascade construction introduces vulnerabilities hashing the last message block. Instead the tag is essentially computed as
h (K || h ( K || m))). Because of pre-image resistance, it is infeasible for an attacker to create the same tag
tunless he or she knows the key
K, and because of collision resistance, the tag could have been created only by concatenating the message m to the key
Pseudo-random number generation. One standard way to build a random number generator is to take a key
K, usually called a "seed", and to compute
h (K || 0 ), h (K || 1 ), h (K || 2 ), with each digest representing a different random number. If
Kare carefully selected, it is infeasible to distinguish this, by any statistical test, from a stream of genuine random numbers.
- Stream ciphers. A stream cipher can be built by taking a hash function
h, and encrypting message
mi ⊕ mi ⊕ h (K, i ), where " ⊕" denotes XOR, and decryption is mi ⊕ h (K, i ) ⊕ (mi ⊕ h (K, i )) ⊕ h (K, i ) = mi.
- Random oracles. Random oracles can be thought of as specialized random number generators. They are used widely in cryptography, such as for randomizing public and private key encryption, to make them secure from arcane attacks.