Dr. Dobb's Journal January 1997
Examining the Birthday Paradox
Consider the following question: If we have a group of 23 people, what is the probability p that at least two of them have the same birthday? The answer can be computed by looking at the problem in a different way: We compute the probability that they all have a different birthday (this probability is equal to 1-p). The first person can have his or her birthday on any day; for the second person, there are 364 days left out of 365 possible birthdays (we ignore leap years); for the third person, there are 363 days left out of 365, and so on. For the 23rd person, there are 343 days left out of 365.
The probability that all these birthdays are different is equal to the product of these probabilities, or
Thus we find p = 0.493. This probability is much higher than our intuition suggests. The explanation of this paradox is that there are only 23 people, but 2322/2 = 253 pairs of people.
How can this now be applied to finding collisions for hash functions? For a hash function with an n-bit result (typically n = 128 or 160), we have a space of size 2n (this corresponds to the 365 days in the year). If we evaluate the hash function for r inputs, we have t = r(r-1)/2 pairs of strings. The probability that the two strings in a pair are identical is 1/(2n). So if t equals about 2n, we expect to find one pair with identical strings. This corresponds to r 2n/2. So for a hash function with a 128-bit result, the effort to find a collision is only about 264 evaluations of the hash function, which is the square root of the effort to find a preimage. It turns out that clever algorithms allow us to find these solutions without having to store this huge number of strings.
-- A.B., H.D., B.P.
Copyright © 1997, Dr. Dobb's Journal