Algorithm Alley

Sound is surprisingly difficult to compress. It is much more subtle than video compression and less obvious than text. Kyle York examines lossy-compression techniques that have been optimized for sound.

June 01, 1995
URL:http://www.drdobbs.com/tools/algorithm-alley/184409579

Figure 1

JUN95: ALGORITHM ALLEY

ALGORITHM ALLEY

Sound Compression Using Quantized Deltas

Kyle A. York

Kyle is a programmer for McGraw-Hill School Systems and can be contacted at [email protected].

Introduction

by Bruce Schneier

Data-compression techniques can be divided into two major categories: lossy and lossless. Lossy data compression allows for a certain loss of accuracy in exchange for an increased compression rate. These techniques are primarily used to compress graphics images and digitized voice--media that can afford to lose data. Most lossy techniques allow you to tune your parameters, so you can trade more-effective compression for greater accuracy. Until recently, lossy compression was implemented primarily on dedicated hardware, but increases in the power of desktop PCs, coupled with advances in algorithms have changed this.

Lossless-compression techniques guarantee an exact copy of the original after compression and decompression. These techniques are used to compress data files, programs, or any application for which even the loss of a single bit is unacceptable.

This month, Kyle York examines lossy compression techniques that have been optimized for sound. His false starts and deadends depict the difficulty of finding a particular algorithm to solve a given problem, even if both the algorithms and the problems are well defined.

Computer-generated audio samples are created by taking an analog signal and passing it through a digital-to-analog converter (DAC). The quality of the sample is determined by the sampling rate and the resolution (number of bits per sample).

As far back as 1933, AT&T mathematician Harry Nyquist determined that to accurately reproduce digital signals at a given frequency f, the sample rate must be at least 2f. Since the human ear can hear past 20 kHz, you would expect a need for at least 40-kHz sampling. Luckily, most applications require much less than fullspectrum sampling.

Typical computer applications sample audio at between 8000 Hz (U.S. telephone quality) and 44,000 Hz (audio CD quality), using either 8- or 16-bit samples that are some representation of the voltage present in the signal. Normally, it is a linear relationship; for example, if the voltage difference between 0 and 1 is 0.1mv, then the voltage difference between 100 and 101 is also 0.1mv. The technique I developed works well for 8-bit samples. The 8-bit values returned by the DAC are usually signed, where --127_128 represents --2.5mv_2.5mv (some return unsigned values are shifted by adding 127). Figure 1(a) is a small audio sample that has obvious patterns, although the exact similarities are not nearly as clear.

I searched the literature for real-time, high-ratio audio-compression techniques, and I found that the best rely on linear predictive coding (LPC). The LPC algorithm uses preceding audio samples to predict the next sample. This algorithm can be implemented as lossless compression by encoding the difference between the expected and actual values. If the predictor is good, these differences should be small and should compress well. Researchers have spent much time attempting to determine an optimum prediction strategy, but you can achieve reasonable results if you simply use the preceding two samples to define a line and assume the current sample will be near that line (thus the term "linear prediction").

Unfortunately, my application required that the (de)compression be fast enough to perform in real time on a 286-class PC. Since any LPC algorithm is very CPU intensive, this requirement immediately ruled them out. On a machine this slow, about the best I can do is use a lookup table and a few bit manipulations, but I was still determined to achieve a ratio of at least 4:1.

I then explored other methods: reducing the sample rate by a factor of 4 (bad idea), ignoring the low-order bits (equally bad--more than 70 percent of the information is actually stored here), and ignoring the high-order bits (this works, but decreases the volume, and when I later increased the volume, the resulting static was unbearable).

Finally, I turned to an algorithm described by Mark Nelson in The Data Compression Book (M&T Publishing, 1991), which plots an exponential graph and bands the samples. In other words, samples that fall in a certain band are all mapped to a single value; see Figure 1(b). The band weighted the lower samples heavier, and since most samples fall in the lower magnitudes, it worked okay. Still, as with the other techniques, I continued to have a problem with static. Finally, it occurred to me that I really wanted to concentrate on modeling the waveform as closely as possible. When I did this, the static virtually disappeared.

Deltas

It was obvious that banding was never going to be sufficient because it changed the waveforms too drastically. It was also apparent, when I magnified part of the graph, that the magnitude of the differences between samples tended to be small, so my next attempt was simply to create a file which contained the differences between the samples of my input file. When I looked at the frequencies, I found that 40 percent of the differences were between --8 and 8. So a Huffman compressor would work much better on this.

Next, I looked at quantizing the deltas. My approach to quantizing is not linear but rather exponential; see Figure 2. Compression from this is immediately 2:1, and is simple enough to be done in a lookup table; thus real time can be achieved even on the slowest machine. When I graphed the results, 96 percent of the samples were within +/-1 percent; see Figure 1(c).

Since I was working with voice data (not music), up to about 1/3 of the file was silence. By running the compressed data through a specialized run length encoding (RLE) filter that detected the silence, I further reduced the sample size. I defined silence as any length of 8 nybbles that reduced to +/-1 when summed. This length was chosen to remove any spikes that might appear in a sample, such as line noise. The block is short enough to not cause any serious degradation in sound quality.

I implemented this approach in C. Listing One (listings begin on page 138) is the include file that accompanies Listing Two which, in turn, converts .WAV format files to .DQ format. Listing Three converts back from .DQ to .WAV. Sample data files (one, an original output of 999 samples from a test file; another, 999 samples from after being DQed and unDQed; and a third that included 999 samples after being compressed/expanded using the algorithm in Nelson's book) are available electronically; see "Availability," page 3.

Future Enhancements

If the sound sample is music instead of voice, then the RLE filter will have little effect. I've tinkered with a Huffman compressor, which increases compression somewhat. It would probably be better to use an arithmetic compressor, which I have estimated could reduce the results by 25 percent or more, but this begins to reach beyond the capabilities of my platform.

Sound is surprisingly difficult to compress, which may explain the relatively little work done in this area. It is much more subtle than video compression and less obvious than text. But with virtually every new program adding uncompressed sound files to our machines, the time has come to look for ways to ease the load.

Figure 1 (a) Audio sample of approximately 400 data points; (b) straightforward exponential compression can result in "banding;" (c) exponential compression of the deltas better model the waveform.

Figure 2: Quantizing deltas.

Start an accumulator at 128 (the 0 voltage value for my DAC). For each sample
     set [magnitude] = abs(sample - accumulator),
     choose the power 2^n (where 0 <= n <= 7) closest to [magnitude]
          if (sample >= accumulator)
               store [n] as 8+[n],
               add 2^n to [accumulator]
          otherwise
               store [n] as 7-[n]
               subtract 2^n from [accumulator]
     done.

Listing One


/* DQ header file -- kyle a. york -- contains best guess for WAV format */
#ifndef dq_h
#define dq_h

typedef struct {
  char RIFFsig[4];
  long junk1;
  char WAVEsig[4];
  char fmtsig[4];
  long len;
  short style;
  short channels;
  long  rate;
  long  avg_Bps;
  short align;
  short bitsize;
} RIFFHEADER;
typedef struct {
  char datasig[4];
  long length;
} RIFFDATAHEADER;
/* this should be a 16-byte structure */
typedef struct {
  char   descr[4];
  short  bitSize;
  short  sampleRate;
  long   length;
  char   misc[4];
} DQHEADER;
#endif

Listing Two


/* kyle a. york. convert WAV --> DQ. Notes: only handles a small subset of 
   WAV files (8-bit/1-channel/unsigned) drops the last sample if the number 
   of samples is odd
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <conio.h>
#include <assert.h>

#include "dq.h"

#define NOTRIFF      "Not a RIFF file."
#define NOTWAVE      "Not a WAVE file."
#define MISSINGFMT   "Missing format info"
#define UNKNOWNSTYLE "style is not unsigned"
#define NODATA       "no data"
#define NOTEIGHTBIT  "data is not 8-bit"
#define INFILEERROR  "input file error"
#define OUTFILEERROR "output file error"
#define OUTOFMEMORY  "out of memory"
#define ERRORFI      "cannot open source file"
#define ERRORFO      "cannot open destination file"
#define TOOMANYCHANNELS "too many channels (only 1 is allowed)"

void EXIT(char *tmp)
{
  fprintf(stderr, "%s\n", tmp);
  exit(1);
}
/* add default extension to a filename [ext] must include the preceding '.'
  returns NULL on out of memory */
#define EXTFLAG_DEFAULT  0
#define EXTFLAG_CHANGE   1
char *AddExtension(const char *src, const char *ext, const int flags)
{
  char *tmp = malloc(strlen(src)+strlen(ext)+1);
  if (tmp) {
    char *tPtr;
    strcpy(tmp, src);
    tPtr = strrchr(tmp, '.');
    if (tPtr && strchr(tPtr, '\\'))
      tPtr = NULL;
    if (tPtr) {
      if (flags & EXTFLAG_CHANGE)
        strcpy(tPtr, ext);
    } else
      strcat(tmp, ext);
  }
  return tmp;
}
/* eliminate lot's of shifts with a simple lookup buffer */
static int expand[16] =
  {
    -0x80, -0x40, -0x20, -0x10, -0x08, -0x04, -0x02, -0x01,
     0x01,  0x02,  0x04,  0x08,  0x10,  0x20,  0x40,  0x80
  };
/* Format: dq source dest   --   return 0 if no error */
int main(int argc, char **argv)
{
  RIFFHEADER     riffHeader;
  RIFFDATAHEADER riffDataHeader;
  DQHEADER       dqHeader;
  char *srcName,
       *dstName;
  FILE *fi,
       *fo;
  long  len;
  int   accum,
        delta,
        sign,
        nybble,
        prev,
        next,
        byte,
        sample;
  if ((argc != 2) && (argc != 3)) {
    printf("Format: %s srcfile {dstfile}\n"
           "  where srcfile is a '.wav' file\n",
           argv[0]);
    exit(1);
  }
  srcName = AddExtension(argv[1], ".wav", EXTFLAG_DEFAULT);
  if (argc == 3)
    dstName = AddExtension(argv[2], ".dq", EXTFLAG_DEFAULT);
  else
    dstName = AddExtension(argv[1], ".dq", EXTFLAG_CHANGE);
  if (!srcName || !dstName)
    EXIT(OUTOFMEMORY);

  fi = fopen(srcName, "rb");
  if (!fi)
    EXIT(ERRORFI);
  fo = fopen(dstName, "wb");
  if (!fo)
    EXIT(ERRORFO);
  /* read & verify WAV header */
  if (fread(&riffHeader, sizeof(riffHeader), 1, fi) != 1)
    EXIT(INFILEERROR);
  else if (strncmp(riffHeader.RIFFsig, "RIFF", 4))
    EXIT(NOTRIFF);
  else if (strncmp(riffHeader.WAVEsig, "WAVE", 4))
    EXIT(NOTWAVE);
  else if (strncmp(riffHeader.fmtsig, "fmt ", 4))
    EXIT(MISSINGFMT);
  else if (riffHeader.style != 1)
    EXIT(UNKNOWNSTYLE);
  else if (riffHeader.channels != 1)
    EXIT(TOOMANYCHANNELS);
  else if (riffHeader.bitsize != 8)
    EXIT(NOTEIGHTBIT);
  /* skip any. misc bytes */
  for (len=riffHeader.len-16; len; len--)
    fgetc(fi);
  /* data header should follow. read & verify */
  if (fread(&riffDataHeader, sizeof(riffDataHeader), 1, fi) != 1)
    EXIT(INFILEERROR);
  else if (strncmp(riffDataHeader.datasig, "data", 4))
    EXIT(NODATA);
  /* setup my file header */
  memset(&dqHeader, 0, sizeof(dqHeader));
  memcpy(dqHeader.descr, "DQ  ", 4);
  dqHeader.bitSize = 8;
  dqHeader.sampleRate = riffHeader.rate;
  dqHeader.length = (riffDataHeader.length/2)*2;
  if (fwrite(&dqHeader, sizeof(dqHeader), 1, fo) != 1)
    EXIT(OUTFILEERROR);
  /* init accumulator to zero value (0x80)  */
  accum = 0x80;
  len = 0;
  while ((sample = fgetc(fi)) != EOF) {
    delta = abs(sample-accum);       /* absolute difference */
    sign  = (sample >= accum);       /* used later          */
    /* quantize [delta] to a power of 2 */
    if (delta >= 128)
      nybble=7;
    else if (delta >= 64)
      nybble=6;
    else if (delta >= 32)
      nybble=5;
    else if (delta >= 16)
      nybble=4;
    else if (delta >=  8)
      nybble=3;
    else if (delta >= 4)
      nybble=2;
    else if (delta >= 2)
      nybble=1;
    else
      nybble=0;
    /* check nybble-1...nybble+1 for closest
      eg. minimize (abs((accum +/- delta) - sample))
    */
    prev = max(nybble-1, 0);
    next = min(nybble+1, 7);
    if (sign) {
      nybble = 7-nybble;
      prev   = 7-prev;
      next   = 7-next;
    } else {
      nybble += 8;
      prev   += 8;
      next   += 8;
    }
    if (abs(accum-expand[prev]-sample) < abs(accum-expand[nybble]-sample))
      nybble = prev;
    else if (abs(accum-expand[next]-sample) < abs(accum-expand[nybble]-sample))
      nybble = next;
    /* pack 2-1 using the last bit in length as a flag. 1st nybble goes high */
    if ((len & 0x01) == 0)
      byte = nybble << 4;
    else if (fputc(byte | nybble, fo) == -1)
      EXIT(OUTFILEERROR);
    len++;
    /* adjust accumulator & repeat */
    accum -= expand[nybble];
  }
  fclose(fi);
  fclose(fo);
  free(srcName);
  free(dstName);
  return 0;
}

Listing Three


/* kyle a. york.  convert DQ --> WAV. Notes: * does no data smoothing */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <conio.h>
#include <assert.h>

#include "dq.h"

#define NOTADQFILE   "not a DQ file"
#define INFILEERROR  "input file error"
#define OUTFILEERROR "output file error"
#define OUTOFMEMORY  "out of memory"
#define ERRORFI      "cannot open source file"
#define ERRORFO      "cannot open destination file"

void EXIT(char *tmp)
{
  fprintf(stderr, "%s\n", tmp);
  exit(1);
}
/* add default extension to a filename [ext] must include the preceding '.'
  returns NULL on out of memory
*/
#define EXTFLAG_DEFAULT  0
#define EXTFLAG_CHANGE   1

char *AddExtension(const char *src, const char *ext, const int flags)
{
  char *tmp = malloc(strlen(src)+strlen(ext)+1);
  if (tmp) {
    char *tPtr;
    strcpy(tmp, src);
    tPtr = strrchr(tmp, '.');
    if (tPtr && strchr(tPtr, '\\'))
      tPtr = NULL;
    if (tPtr) {
      if (flags & EXTFLAG_CHANGE)
        strcpy(tPtr, ext);
    } else
      strcat(tmp, ext);
  }
  return tmp;
}
/* eliminate lot's of shifts with a simple lookup buffer */
static int expand[16] =
  {
    -0x80, -0x40, -0x20, -0x10, -0x08, -0x04, -0x02, -0x01,
     0x01,  0x02,  0x04,  0x08,  0x10,  0x20,  0x40,  0x80
  };

#define adjust(a) (((a) < 0) ? 0 : ((a) > 255) ? 255 : a)
/* Format: dq source dest  -- return 0 if no error */
int main(int argc, char **argv)
{
  RIFFHEADER     riffHeader;
  RIFFDATAHEADER riffDataHeader;
  DQHEADER       dqHeader;
  char *srcName,
       *dstName;
  FILE *fi,
       *fo;
  int   accum,
        byte;
  if ((argc != 2) && (argc != 3)) {
    printf("Format: %s srcfile {dstfile}\n"
           "  where srcfile is a '.dq' file\n", argv[0]);
    exit(1);
  }
  srcName = AddExtension(argv[1], ".dq", EXTFLAG_DEFAULT);
  if (argc == 3)
    dstName = AddExtension(argv[2], ".wav", EXTFLAG_DEFAULT);
  else
    dstName = AddExtension(argv[1], ".wav", EXTFLAG_CHANGE);

  if (!srcName || !dstName)
    EXIT(OUTOFMEMORY);

  fi = fopen(srcName, "rb");
  if (!fi)
    EXIT(ERRORFI);
  fo = fopen(dstName, "wb");
  if (!fo)
    EXIT(ERRORFO);
  /* read & verify DQ header   */
  if (fread(&dqHeader, sizeof(dqHeader), 1, fi) != 1)
    EXIT(INFILEERROR);
  else if (strncmp(dqHeader.descr, "DQ  ", 4))
    EXIT(NOTADQFILE);
  /* create necessary RIFF/WAVE headers */
  memset(&riffHeader, 0, sizeof(riffHeader));
  memcpy(&riffHeader.RIFFsig, "RIFF", 4);
  memcpy(&riffHeader.WAVEsig, "WAVE", 4);
  memcpy(&riffHeader.fmtsig, "fmt ", 4);
  riffHeader.style = 1;
  riffHeader.bitsize = dqHeader.bitSize;
  riffHeader.len = 16;
  riffHeader.channels = 1;
  riffHeader.rate = dqHeader.sampleRate;
  riffHeader.avg_Bps = dqHeader.sampleRate;
  riffHeader.align = 1;

  memset(&riffDataHeader, 0, sizeof(riffDataHeader));
  memcpy(&riffDataHeader.datasig, "data", 4);
  riffDataHeader.length = dqHeader.length;

  if (fwrite(&riffHeader, sizeof(riffHeader), 1, fo) != 1)
    EXIT(OUTFILEERROR);
  else if (fwrite(&riffDataHeader, sizeof(riffDataHeader), 1, fo) != 1)
    EXIT(OUTFILEERROR);
  /* init accumulator to zero value (0x80)  */
  accum = 0x80;
  while ((byte = fgetc(fi)) != EOF) {
    int sample1 = expand[(byte >> 4) & 0x0f],
        sample2 = expand[byte & 0x0f];
    accum -= sample1;
    fputc(adjust(accum), fo);
    accum -= sample2;
    fputc(adjust(accum), fo);
  }
  fclose(fi);
  fclose(fo);
  free(srcName);
  free(dstName);
  return 0;
}