Perpetual Compression and the Death of a Newsgroup
A recent article in EE Times by Cees Jan Koomen talks about an improbable compression scheme proposed by Jan Sloot in the late 90's. Sloot claimed to be able to compress a movie down to 8 kbytes, and was ready to take in some real money just before his untimely death.
For some reason the field of data compression seems to attract more than its share of scammers, and in a few cases they have managed to collect millions of dollars in investment money before either being exposed, indicted, or disappearing.
Random Compression
While a lot of compression scams are a result of greed and larceny, I have a feeling that many start out with a foundation of ignorance. Since writing The Data Compression Book in 1992, I've been on the receiving end of countless proposals for algorithms that can "compress any file", "compress random data", and the ever-popular "repeatedly compress its own output."
Ultimately all three of these claims have to effectively do the same thing, which is to take any arbirtrary file as input, and guarantee to produce a smaller file as output.
This is of course, not possible when compressing losslessly, as anyone who can understand the Pigeonhole Principle has to agree. But that doesn't prevent the enthusiasts from figuratively grabbing you by the lapels and eagerly describing their latest idea that will compress a file of any size down to 256 bytes, eventually.
The Death of comp.compression
In the grand scheme of things, this minor delusion wouldn't be too much of a problem. But it matters to me for one good reason: it has pretty much destroyed comp.compression.
The first time I saw the power this had over comp.compression was back in 1996, when a person named Jules Gilbert made his first claim to be able to compress random data:
The original random file (of LZ input) is increased in length,
typically by about 30-50%. After re-application of the LZ method,
the resultant file is reduced by perhaps a factor of 3-4 times.
The resulting stir of name-calling, insults, scorn, and pedantry seemed to occupy most of the bandwidth of the newsgroup for weeks on end. 13 years later, the trend continues unabated.
While there are plenty of good posts to comp.compression, it sometimes seems that 50% of the traffic is devoted to arguing about these impossibilities. A look back through recent traffic shows that most of the posts are on sensible topics, but a single post by Jules Gilbert (correct, he has not given up yet!) yields a thread with 29 responses, which means it totally dominates downstream traffic for days on end.
The Challenge
I took a shot at reducing the amount of time arguing the point by creating the Million Random Digit Challenge . , which I posted on comp.compression in 2002. Nobody has succeeded in writing a program which can effectively compress this data, and it is unlikely that anyone ever will. Any poster to comp.compression who claims to be able to do it is usually faced with the challenge, and none have overcome it.
But has that reduced the numbing stream of nonsense? Nope.
I don't know if other niche newsgroups have the same issues but I suspect they each have their own share of trolls and enthusiastic knuckleheads that do their best to cap the usable bandwidth.
Which is a shame; because unmoderated USENET groups in principle are a great, democratic institution, and could have been just as useful as the Wikipedia.
There's a lesson to be learned in this, somewhere.

