Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Design

Reading GIF Files


FEB95: Reading GIF Files

Reading GIF Files

Manipulating graphics files

Wilson MacGyver Liaw

Wilson, who holds a computer-science degree from Ohio State University, can be reached at [email protected].


The Graphics Interchange Format (GIF) has become one of the more popular formats for storing images. First developed by CompuServe in 1987 as a way of exchanging images across different platforms, it has since become the de facto graphics interchange standard for the Internet as well.

The original GIF87a format supported 256 colors and compressed images with a variant of the LZW algorithm. Although limited by today's measures, GIF was still an instant success. This is somewhat surprising since GIF, unlike other graphics file formats, is protected by CompuServe copyrights and built upon the patented LZW compression scheme. However, the only restriction on using GIF is that you acknowledge the CompuServe copyright.

The GIF standard was revised in 1989, resulting in the newer standard known as "GIF89a." GIF87a/GIF89a-compliant encoders and readers are available for most platforms. In this article, I'll focus on the reading process. For information on writing and otherwise manipulating GIF files, I recommend Bitmapped Graphics Programming in C++, by Marv Luse (Addison Wesley, 1993), or Programming for Graphics Files in C and C++, by John Levine (John Wiley & Sons, 1994).

The GIF Format

Every GIF file starts with a header block identifying the file as a GIF file. The header block is always six bytes long, and the value is either GIF87a or GIF89a.

Following the header block is the logical screen-descriptor block (see Figure 1) containing:

  • Logical screen width, a 2-byte value.
  • Logical screen height, a 2-byte value.
  • A 1-byte packed field, containing the global color-table flag, color resolution, sort flag, and size of global color table.
  • Background-color index, storing the index value to the global color table for drawing any area not covered by an image.
  • Pixel aspect ratio, a 1-byte value used to approximate the original image's aspect ratio (aspect ratio=(pixel aspect ratio+15)/64). This allows for a range of pixel widths, from 4:1 for the widest pixel, to 1:4 for the tallest pixel.
If the 1-bit global color-table flag is set to 1, the global color table will follow the logical screen-descriptor block. The 3-bit color-resolution value indicates the number of bits per primary color available, minus one.

The sort flag indicates whether the global color table has been sorted. If the value is 1, then the table is sorted in order of decreasing importance. The size of the global color table (three bits) is 2(value+1), which yields the maximum of 256 colors.

A global color-table entry is a triple in the form of red, green, and blue. Each color value occupies one byte. Thus, each entry is three bytes.

After the program has processed these structures, the logical drawing space is ready to read in the images. Every image starts with the image descriptor block, followed by an optional local color table and the LZW-compressed image data.

The image descriptor block (see Figure 2), which is similar to the logical screen descriptor block, consists of:

  • Image separator (one byte), which contains the value 0x2C. This indicates that a new image descriptor is starting.
  • Image left position (two bytes), which contains the initial X position of the image. The left-most X is 0.
  • Image top position (two bytes), which contains the initial Y position of the image. The top-most Y is 0.
  • Image width (two bytes), which contains the width of the image.
  • Image height (two bytes), which contains the height of the image.
  • A 1-byte packed field, with the following values: local color-table flag, interlace flag, sort flag, and size of local color table.
The local color-table flag (one bit) indicates the presence of the local color- table block. If present, the local color table is read and used instead of the global color table for this image only. The local color table follows the same triple format as the global color table.

The interlace flag (one bit) indicates whether the image is stored in the four-pass interlace pattern. If the value is 1, the image data is stored in the interlace form. The data is then arranged in the following manner: The first group of data gives pixels of every eighth row, starting with row 0. The second group gives pixels of every eighth row, starting with row number four. The third group gives pixels of every fourth row, starting with row number two. The last group gives every second row, starting with row number one. Table 1 illustrates this grouping.

The sort flag indicates if the table is sorted in order of decreasing importance, as it does in the global color table. The size of the local color table is determined in the same way as the global color table, by computing 2(value+1).

The first byte of the compressed image data is the LZW minimum code size, followed by a series of bytes. The compressed image data is terminated with an end-of-information code.

GIF compresses the data using the LZW algorithm, with two major differences: First, it uses a variable-length code size. The value from the LZW minimum code size is the initial number of bits used for the compression codes. When the number of patterns detected by the encoder exceeds the maximum number of patterns allowed by the current bit size, the number of bits per code is increased by one. GIF allows up to 12 bits per code, thus the maximum is 4096 codes.

Second, GIF adds two special codes. The first is a clear code, defined to be 2(code size). For example, if the code size is 2, then the clear code would be 4. The clear code resets and initializes the table back to the startup state. The second code is an end-of-information code. The value is defined as (Clear Code+1). This marks the end of the LZW-compressed image data. All other aspects of the GIF-variant LZW algorithm are the same as the standard LZW algorithm. For more information on the LZW algorithm, refer to the article, "LZW Data Compression," by Mark R. Nelson (DDJ, October 1989). Listing One contains C code, written by Steven A. Bennett, that deals with the GIF-variant LZW algorithm.

Extension and Trailer Blocks

GIF89a introduces extension blocks that add extra functions in GIF without requiring massive changes in the format itself. There are many types of extension blocks. They all start with a byte called the "extension introducer" which always contains the value 0x21. Every extension block is also terminated with a block terminator (a byte containing the value 0x00). This allows the reader to skip the extension block.

For example, the comment extension block starts with the extension introducer, followed by the comment label, comment data, and the block terminator. The 1-byte comment label has the value of 0xFE and identifies the extension block as a comment extension block. The comment data are 7-bit ASCII characters. The comment extension block is used to comment the image file. For information on other extension blocks, see Graphics Interchange Format Version 89a, available on the CompuServe Graphics Support Forum, Library 17.

All GIF files conclude with the trailer block, a single byte containing the value 0x3B, which indicates the end of the GIF file.

Table 1: Interlace groupings.

    Row Number   1st Group   2nd Group   3rd Group   4th Group   
    0            x
    1                                                x
    2                                    x
    3                                                x
    4                        x
    5                                                x
    6                                    x
    7                                                x
    8            x
    .
    .
    .
    

Figure 1 Logical-screen-descriptor layout. Figure 2 Image-descriptor-block layout.

Listing One


/* decode.c is Steven Bennett's code with some minor changes */
/* the original copyright from him follows */
/* Wilson MacGyver Liaw */

/* DECODE.C - An LZW decoder for GIF
 * Copyright (C) 1987, by Steven A. Bennett
 * Permission is given by the author to freely redistribute and include
 * this code in any program as long as this credit is given where due.
 * In accordance with the above, I want to credit Steve Wilhite who wrote
 * the code which this is heavily inspired by...
 * GIF and 'Graphics Interchange Format' are trademarks (tm) of
 * Compuserve, Incorporated, an H&R Block Company.
 * Release Notes: This file contains a decoder routine for GIF images
 * which is similar, structurally, to the original routine by Steve Wilhite.
 * It is, however, somewhat noticably faster in most cases.
 */

/* the defined ERRS */
#define OUT_OF_MEMORY -10
#define BAD_CODE_SIZE -20
#define READ_ERROR -1
#define WRITE_ERROR -2
#define OPEN_ERROR -3
#define CREATE_ERROR -4

extern char *malloc();                 /* Standard C library allocation */

/* extern int get_byte() - This external (machine specific) function is 
 * expected to return either the next byte from the GIF file, or a negative 
 * number, as defined.  */
extern int get_byte();

/* extern int out_line(pixels, linelen)
 *     unsigned char pixels[];
 *     int linelen;
 * This function takes a full line of pixels (one byte per pixel) and
 * displays them (or does whatever your program wants with them...).  It
 * should return zero, or negative if an error or some other event occurs
 * which would require aborting the decode process...  Note that the length
 * passed will almost always be equal to the line length passed to the
 * decoder function, with the sole exception occurring when an ending code
 * occurs in an odd place in the GIF file...  In any case, linelen will be
 * equal to the number of pixels passed... */
extern int out_line();

/* extern int bad_code_count; - This value is the only other global required 
 * by the using program, and is incremented each time an out of range code is 
 * read by the decoder. When this value is non-zero after a decode, your GIF 
 * file is probably corrupt in some way... */
extern int bad_code_count;

#define NULL   0L
#define MAX_CODES   4095

/* Static variables */
static short curr_size;                     /* The current code size */
static short clear;                         /* Value for a clear code */
static short ending;                        /* Value for a ending code */
static short newcodes;                      /* First available code */
static short top_slot;                      /* Highest code for current size */
static short slot;                          /* Last read code */

/* The following static variables are used for seperating out codes  */
static short navail_bytes = 0;              /* # bytes left in block */
static short nbits_left = 0;                /* # bits left in current byte */
static unsigned char b1;                    /* Current byte */
static unsigned char byte_buff[257];        /* Current block */
static unsigned char *pbytes;               /* Pointer to next byte in block */

static long code_mask[13] = {
     0,
     0x0001, 0x0003,
     0x0007, 0x000F,
     0x001F, 0x003F,
     0x007F, 0x00FF,
     0x01FF, 0x03FF,
     0x07FF, 0x0FFF
     };
/* This function initializes the decoder for reading a new image. */
static short init_exp(size)
   short size;
   {
   curr_size = size + 1;
   top_slot = 1 << curr_size;
   clear = 1 << size;
   ending = clear + 1;
   slot = newcodes = ending + 1;
   navail_bytes = nbits_left = 0;
   return(0);
   }
/* get_next_code() - gets the next code from the GIF file. Returns the code, or
 * else a negative number in case of file errors... */
static short get_next_code()
   {
   short i, x;
   unsigned long ret;
   if (nbits_left == 0)
      {
      if (navail_bytes <= 0)
         {
         /* Out of bytes in current block, so read next block */
         pbytes = byte_buff;
         if ((navail_bytes = get_byte()) < 0)
            return(navail_bytes);
         else if (navail_bytes)
            {
            for (i = 0; i < navail_bytes; ++i)
               {
               if ((x = get_byte()) < 0)
                  return(x);
               byte_buff[i] = x;
               }
            }
         }
      b1 = *pbytes++;
      nbits_left = 8;
      --navail_bytes;
      }
   ret = b1 >> (8 - nbits_left);
   while (curr_size > nbits_left)
      {
      if (navail_bytes <= 0)
         {
         /* Out of bytes in current block, so read next block */
         pbytes = byte_buff;
         if ((navail_bytes = get_byte()) < 0)
            return(navail_bytes);
         else if (navail_bytes)
            {
            for (i = 0; i < navail_bytes; ++i)
               {
               if ((x = get_byte()) < 0)
                  return(x);
               byte_buff[i] = x;
               }
            }
         }
      b1 = *pbytes++;
      ret |= b1 << nbits_left;
      nbits_left += 8;
      --navail_bytes;
      }
   nbits_left -= curr_size;
   ret &= code_mask[curr_size];
   return((short)(ret));
   }
/* The reason we have these seperated like this instead of using a structure
 * like the original Wilhite code did, is because this stuff generally produces
 * significantly faster code when compiled. This code is full of similar
 * speedups. (For a good book on writing C for speed or for space optimization,
 * see Efficient C by Tom Plum, published by Plum-Hall Associates.) */
static unsigned char stack[MAX_CODES + 1];    /* Stack for storing pixels */
static unsigned char suffix[MAX_CODES + 1];   /* Suffix table */
static unsigned short prefix[MAX_CODES + 1];  /* Prefix linked list */

/* short decoder(linewidth)
 *    short linewidth;               * Pixels per line of image *
 * - This function decodes an LZW image, according to the method used
 * in the GIF spec.  Every *linewidth* "characters" (ie. pixels) decoded
 * will generate a call to out_line(), which is a user specific function
 * to display a line of pixels.  The function gets it's codes from
 * get_next_code() which is responsible for reading blocks of data and
 * seperating them into the proper size codes.  Finally, get_byte() is
 * the global routine to read the next byte from the GIF file.
 * It is generally a good idea to have linewidth correspond to the actual
 * width of a line (as specified in the Image header) to make your own
 * code a bit simpler, but it isn't absolutely necessary.
 * Returns: 0 if successful, else negative.  (See ERRS defined) */
short decoder(linewidth)
   short linewidth;
   {
   register unsigned char *sp, *bufptr;
   unsigned char *buf;
   register short code, fc, oc, bufcnt;
   short c, size, ret;
   /* Initialize for decoding a new image... */
   if ((size = get_byte()) < 0)
      return(size);
   if (size < 2 || 9 < size)
      return(BAD_CODE_SIZE);
   init_exp(size);
   /* Initialize in case they forgot to put in a clear code.
    * (This shouldn't happen, but we'll try and decode it anyway...) */
   oc = fc = 0;
   /* Allocate space for the decode buffer */
   if ((buf = (unsigned char *)malloc(linewidth + 1)) == NULL)
      return(OUT_OF_MEMORY);
   /* Set up the stack pointer and decode buffer pointer */
   sp = stack;
   bufptr = buf;
   bufcnt = linewidth;
   /* This is the main loop.  For each code we get we pass through the
    * linked list of prefix codes, pushing the corresponding "character" for
    * each code onto the stack.  When the list reaches a single "character"
    * we push that on the stack too, and then start unstacking each character
    * for output in the correct order.  Special handling is included for the
    * clear code, and the whole thing ends when we get an ending code. */
   while ((c = get_next_code()) != ending)
      {
      /* If we had a file error, return without completing the decode */
      if (c < 0)
         {
         free(buf);
         return(0);
         }
      /* If the code is a clear code, reinitialize all necessary items. */
      if (c == clear)
         {
         curr_size = size + 1;
         slot = newcodes;
         top_slot = 1 << curr_size;
         /* Continue reading codes until we get a non-clear code
          * (Another unlikely, but possible case...) */
         while ((c = get_next_code()) == clear)
            ;
         /* If we get an ending code immediately after a clear code
          * (Yet another unlikely case), then break out of the loop. */
         if (c == ending)
            break;
         /* Finally, if the code is beyond the range of already set codes, 
          * (this had better NOT happen. I have no idea what will result from
          * this, but I doubt it will look good) then set it to color zero. */
         if (c >= slot)
            c = 0;
         oc = fc = c;
         /* And let us not forget to put the char into the buffer. And if, on
          * the off chance, we were exactly one pixel from the end of the line,
          * we have to send the buffer to the out_line() routine... */
         *bufptr++ = c;
         if (--bufcnt == 0)
            {
            if ((ret = out_line(buf, linewidth)) < 0)
               {
               free(buf);
               return(ret);
               }
            bufptr = buf;
            bufcnt = linewidth;
            }
         }
      else
         {
         /* In this case, it's not a clear code or an ending code, so
          * it must be a code code...  So we can now decode the code into
          * a stack of character codes. (Clear as mud, right?) */
         code = c;
         /* Here we go again with one of those off chances...  If, on the
          * off chance, the code we got is beyond the range of those already
          * set up (Another thing which had better NOT happen...) we trick
          * the decoder into thinking it actually got the last code read.
          * (Hmmn... I'm not sure why this works...  But it does...) */
         if (code >= slot)
            {
            if (code > slot)
               ++bad_code_count;
            code = oc;
            *sp++ = fc;
            }
         /* Here we scan back along the linked list of prefixes, pushing
          * helpless characters (ie. suffixes) onto the stack as we do so. */
         while (code >= newcodes)
            {
            *sp++ = suffix[code];
            code = prefix[code];
            }
         /* Push the last character on the stack, and set up the new
          * prefix and suffix, and if the required slot number is greater
          * than that allowed by the current bit size, increase the bit
          * size.  (NOTE - If we are all full, we *don't* save the new
          * suffix and prefix...  I'm not certain if this is correct...
          * it might be more proper to overwrite the last code... */
         *sp++ = code;
         if (slot < top_slot)
            {
            suffix[slot] = fc = code;
            prefix[slot++] = oc;
            oc = c;
            }
         if (slot >= top_slot)
            if (curr_size < 12)
               {
               top_slot <<= 1;
               ++curr_size;
               } 
         /* Now that we've pushed the decoded string (in reverse order) onto
          * the stack, lets pop it off and put it into our decode buffer.
          * And when the decode buffer is full, write another line... */
         while (sp > stack)
            {
            *bufptr++ = *(--sp);
            if (--bufcnt == 0)
               {
               if ((ret = out_line(buf, linewidth)) < 0)
                  {
                  free(buf);
                  return(ret);
                  }
               bufptr = buf;
               bufcnt = linewidth;
               }
            }
         }
      }
   ret = 0;
   if (bufcnt != linewidth)
      ret = out_line(buf, (linewidth - bufcnt));
   free(buf);
   return(ret);
   }


Copyright © 1995, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.