The Million Random Digit Challenge
I've mentioned my Million Random Digit Challenge here before. In a nutshell, I've posted a file of a million random decimal digits, packed into binary form, and challenged all comers to compress it. The proof is required to be a Kolmogorov-style work: a program that when run creates a perfect copy of the original file. The only requirement is that the program (plus any associated data file) be smaller than the target million digit file.
People have been attempting to meet this challenge since 2002 with no luck. The file was specifically designed by geniuses at RAND not to have any recognizable statistical patterns, and apparently this goal was accomplished quite well. And what do most compression programs do? Look for statistical patterns. Fail.
Classic statistical techniques are just not going to do it for this problem. I think the only chance to win this prize is to use something I've often disparaged, which I call Magic Function Theory.
The idea behind Magic Function Theory is that we come up with some short but sweet generator function that can create a long sequence. Just as an example, I can create a magic function for any sequence imaginable using just three things:
- A program that generates the digits of pi. This program will be quite short.
- An offset into that string of digits.
- A length of the string starting at that offset.
I believe (IANAM) that this system will provably generate all sequences of digits. Of course, how long would we have to go to find the million random digits? Here's where it gets interesting. We might have to go quite a distance, but what if the offset to the million random digits turns out to be an easily compressible number? What if the million random digits appear at position 374,567,11114,127,269 - 623,557,570,925? If that were the case, we could represent the million random digits in a few hundred bytes - quite an accomplishment.
Another approach might be to look for polynomials that generate the million digit number. What if there was some short polynomial of the form kn + j that generated the number, where k, n, and j were representable using some nice compact format?
The final suggestion I will toss out for consideration is to use prime numbers. Find the nth prime number, p(n), that is closest to the million digit number, then add in an offset. The simple formula p(n) + k has a shot at generating our target. (Note the downside to this is that n will be heartbreakingly large. It won't contain a million digits, but it will only be a bit shorter. There are a lot of primes.)
These are the kind of ideas that motivate a lot of dreamers out there, and it is at least a good intellectual exercise to think about how one would go about solving the problem this way. The prime number test, for example, appears to be unsolvable today - p(n) is only known for prime numbers up to something like 1018.
As the prime number analysis shows, it is just very difficult to deal with the hunt when the target is a million digits long. To win at this I suspect raw computer power would be less important than theoretical and algorithmic foundations.
A little analysis shows one thing: you wouldn't just have to be lucky to win with any of these approaches. You would have to be staggeringly, incredibly lucky - as if you were the one elementary particle in the entire universe selected for a lottery prize. If you doubt it, try using some of these tests for say, 10 digit numbers, and see what you would need to do to compress the key values to less than 10 digits.
But despite the odds, many will still continue the hunt. Perhaps this post will give them some ideas on new approaches.