Pigz on the Wing
Mark Adler just might have the distinction of having his code running on more computers than anyone else on earth. This is because Mark (in conjunction with Jean-loup Gailly and many more volunteers) wrote zlib - the free library that reads and writes streams compressed with the deflate algorithm.
Of course, this means that Mark's code is used just about everywhere to read and write files in Zip and gzip format - so it is deployed on virtually every desktop, running any O/S. It is also deployed on zillions of embedded systems running Linux and other operating systems.
It's nice to daydream about how rich Mark would be if he had a nickel for every deployment of zlib, but of course, if zlib cost a nickel, it probably wouldn't be so ubiquitous. I think he will have to be content with eventual enshrinement in the FOSS Hall of Hame.
Next Up: Pigz
Apparently keeping his finger on the pulse of multicore hysteria, Mark recently took some time out to produce a nice little program called pigz, a parallel version of gzip. Pigz has been out for a couple of years now, and this month rolled up to version 2.1.6.
Pigz uses the pthreads library to parallelize compression of data, by grabbing big chunks and processing each one in its own thread. Decompression is not so easy to break apart. Pigz will still use multiple threads when decompressing, but it is basically just decomposing I/O and computation.
What the World Needs Now
There are two things we could do to make parallel decompression a reality.
First, Mark points out that specially constructed deflate streams could be used for parallel processing during decompression. Presumably we would need some modification to the deflate specification to add labeling or marking of some sort for these special streams.
Second, there is no reason that most archivers can't start using parallel threads right now when extracting multiple files from archives. Programs like WinZip could be doing this today, but I don't believe it is implemented in many products.
It will be tough to resolve this question, but can anyone make a case for some other library code that is more widely deployed than zlib? I can think of a few candidates, but nothing that clearly beats it. One possible winner would be some O/S code for a super-cheap 4- or 8-bit processor that has been produced in quantities exceeding a billion. Unfortunately I don't have the kind of research reports handy to figure that out.
Author's Note: I apologize for the title of the article, which deliberately mangles the preferred pronunciation of the product as pig-zee.