Channels ▼

Mark Nelson

Dr. Dobb's Bloggers

Perpetual Compression and the Death of a Newsgroup

January 03, 2008

recent article  in EE Times by Cees Jan Koomen talks about an improbable compression scheme proposed by Jan Sloot in the late 90's. Sloot claimed to be able to compress a movie down to 8 kbytes, and was ready to take in some real money just before his untimely death.

For some reason the field of data compression seems to attract more than its share of scammers, and in a few cases they have managed to collect millions of dollars in investment money before either being exposed, indicted, or disappearing.

Random Compression  

While a lot of compression scams are a result of greed and larceny, I have a feeling that many start out with a foundation of ignorance. Since writing The Data Compression Book in 1992, I've been on the receiving end of countless proposals for algorithms that can "compress any file", "compress random data", and the ever-popular "repeatedly compress its own output."

Ultimately all three of these claims have to effectively do the same thing, which is to take any arbirtrary file as input, and guarantee to produce a smaller file as output.

 This is of course, not possible when compressing losslessly, as anyone who can understand the Pigeonhole Principle  has to agree. But that doesn't prevent the enthusiasts from figuratively grabbing you by the lapels and eagerly describing their latest idea that will compress a file of any size down to 256 bytes, eventually.

The Death of comp.compression

In the grand scheme of things, this minor delusion wouldn't be too much of a problem. But it matters to me for one good reason: it has pretty much destroyed comp.compression.

The first time I saw the power this had over comp.compression was back in 1996, when a person named Jules Gilbert made his first claim  to be able to compress random data:

The original random file (of LZ input) is increased in length, 
typically by about 30-50%.  After re-application of the LZ method, 
the resultant file is reduced by perhaps a factor of 3-4 times. 

The resulting stir of name-calling, insults, scorn, and pedantry seemed to occupy most of the bandwidth of the newsgroup for weeks on end.  13 years later, the trend continues unabated.

While there are plenty of good posts to comp.compression, it sometimes seems that 50% of the traffic is devoted to arguing about these impossibilities. A look back through recent traffic shows that most of the posts are on sensible topics, but a single post by Jules Gilbert (correct, he has not given up yet!) yields a thread with 29 responses, which means it totally dominates downstream traffic for days on end.

The Challenge

I took a shot at reducing the amount of time arguing the point by creating the Million Random Digit Challenge . , which I posted on comp.compression in 2002. Nobody has succeeded in writing a program which can effectively compress this data, and it is unlikely that anyone ever will. Any poster to comp.compression who claims to be able to do it is usually faced with the challenge, and none have overcome it.

But has that reduced the numbing stream of nonsense? Nope.

I don't know if other niche newsgroups have the same issues but I suspect they each have their own share of trolls and enthusiastic knuckleheads that do their best to cap the usable bandwidth.

Which is a shame; because unmoderated USENET groups in principle are a great, democratic institution, and could have been just as useful as the Wikipedia.

 There's a lesson to be learned in this, somewhere. 



Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.