Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼

Mark Nelson

Dr. Dobb's Bloggers

The Million Random Digit Challenge

December 28, 2009

I've mentioned my Million Random Digit Challenge here before. In a nutshell, I've posted a file of a million random decimal digits, packed into binary form, and challenged all comers to compress it. The proof is required to be a Kolmogorov-style work: a program that when run creates a perfect copy of the original file. The only requirement is that the program (plus any associated data file) be smaller than the target million digit file.

People have been attempting to meet this challenge since 2002 with no luck. The file was specifically designed by geniuses at RAND not to have any recognizable statistical patterns, and apparently this goal was accomplished quite well. And what do most compression programs do? Look for statistical patterns. Fail.

Abandon Patterns

 Classic statistical techniques are just not going to do it for this problem. I think the only chance to win this prize is to use something I've often disparaged, which I call Magic Function Theory.

 The idea behind Magic Function Theory is that we come up with some short but sweet generator function that can create a long sequence. Just as an example, I can create a magic function for any sequence imaginable using just three things:

  • A program that generates the digits of pi. This program will be quite short.
  • An offset into that string of digits.
  • A length of the string starting at that offset.

I believe (IANAM) that this system will provably generate all sequences of digits. Of course, how long would we have to go to find the million random digits? Here's where it gets interesting. We might have to go quite a distance, but what if the offset to the million random digits turns out to be an easily compressible number? What if the million random digits appear at position 374,567,11114,127,269 - 623,557,570,925? If that were the case, we could represent the million random digits in a few hundred bytes - quite an accomplishment.

 Another approach might be to look for polynomials that generate the million digit number. What if there was some short polynomial of the form kn + j that generated the number, where k, n, and j were representable using some nice compact format?

 The final suggestion I will toss out for consideration is to use prime numbers. Find the nth prime number, p(n), that is closest to the million digit number, then add in an offset. The simple formula p(n) + k has a shot at generating our target. (Note the downside to this is that n will be heartbreakingly large. It won't contain a million digits, but it will only be a bit shorter. There are a lot of primes.)

If Only...

These are the kind of ideas that motivate a lot of dreamers out there, and it is at least a good intellectual exercise to think about how one would go about solving the problem this way. The prime number test, for example, appears to be unsolvable today - p(n) is only known for prime numbers up to something like 1018.

As the prime number analysis shows, it is just very difficult to deal with the hunt when the target is a million digits long. To win at this I suspect raw computer power would be less important than theoretical and algorithmic foundations.

 A little analysis shows one thing: you wouldn't just have to be lucky to win with any of these approaches. You would have to be staggeringly, incredibly lucky - as if you were the one elementary particle in the entire universe selected for a lottery prize. If you doubt it, try using some of these tests for say, 10 digit numbers, and see what you would need to do to compress the key values to less than 10 digits.

 But despite the odds, many will still continue the hunt. Perhaps this post will give them some ideas on new approaches.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.