Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

Cross-platform Compression


DEC93: Cross-platform Compression

Pierre is an operation support-system manager for the provincial government of Alberta (Canada) and private consultant. He can be reached on CompuServe at 73740,3452.


As the volume of data transferred between disparate computer systems increases, so does the demand for efficient data-transmission tools--and data compression is a proven approach to transfer efficiency. The problem is that compression utilities for transferring data between mainframe, workstation, and PC environments are hard to come by. While numerous compression programs exist for PCs and UNIX workstations, there are few for mainframe environments such as IBM MVS--and even fewer cross-platform ones. In this article, I'll address this void by presenting a cross-platform implementation of an existing Lempel, Ziv, and Huffman (LZH) compression program. My approach to cross-platform compression is based on the LZHUF.C program mentioned in Mark Nelson's The Data Compression Book (M&T Books, 1991). The program, originally written by Haruyasu Yoshizaki and modified by Paul Edwards and Mark Nelson, provides encoding/decoding functions in a single ANSI C source file.

I had no problems compiling LZHUF.C using MVS SAS/C 5.00 for IBM MVS and Turbo C++ Professional for the PC. However, in the MVS version, the characters "[" and "]", which aren't supported by MVS EBCDIC must be replaced with the digraphs ("|" and "|"), respectively. ANSI C does offer trigraphs as substitutes, but this only makes C code even more difficult to read, especially if arrays are used extensively.

During my MVS testing, LZHUF would initially hang when encoding file greater than 250K. (The CPU would time out.) In an e-mail exchange, Paul Edwards told me he had the same problem with the LZHUF MVS-compiled code. Instead of pursuing a solution to the LZHUF code, Paul opted to work with AR.C, an alternate compression source file written by Okumura. After considering the rules of portable C, I noticed that LZHUF.C integer declarations which operated properly in the 16-bit integer PC environment caused problems in the MVS 32-bit environment. Following standard C guidelines, I converted all declarations, and the code worked fine. Furthermore, by adjusting LZHUF lookahead buffer from the original setting of 60 to 15, performance improved while maintaining over 80 percent compression on data file. More specifically, MVS LZHUF encodes a 1-Mbyte data file in about 18 host-CPU seconds and decodes the compressed file in 3 host-CPU seconds. On a DOS-based 486/33 with 64K internal cache, LZHUF encodes a 1-Mbyte data file in about 82 CPU seconds and decodes the compressed file in 45 CPU seconds. In a single VAX session, preliminary encoding and decoding tests were carried out with success, but the session was interrupted for maintenance, leaving no time to establish performance estimates.

Although LZHUF.C for MS-DOS doesn't offer performance equal to PKWARE, LHARC, or ARJ, it had the potential for considerable savings in storage space and transfer time, while minimizing communication risks when used as the basis for developing a cross-platform compression program. On the other hand, MVS LZHUF is resource intensive. During primetime, some sites can charge $1.00 or more per host CPU second. These charges may motivate you to take advantage of site discounts for night runs.

Even though I got LZHUF to work properly in each environment, I still needed to address the task of converting ASCII to EBCDIC and the tricky issue of MVS record formats.

ASCII and EBCDIC Translation

IBM mainframe environments such as MVS and VM rely on a character set called "EBCDIC." In addition, any mainframe installation can implement specific versions of EBCDIC defined by a code-page numbering system. The IBM 3270 terminal emulation software user's guide provides translation table for EBCDIC code which (I assume) is generally used for North American installations. However, the code page should be verified with site installation to avoid translation errors.

Most EBCDIC characters are represented in ASCII but with different decimal values that can be translated by cross-references. For example, the character A is equal to decimal 193 in EDCDIC and 65 in ASCII. As such, in the MVS version of LZHUF, ASCII character values are used to reference the subscript of an array of EBCDIC characters. The inverse is used in the MS-DOS version of LZHUF. This approach to character translation is shown in Listing One (page 98). Since encoding is a slower process, translation is done when decoding and just before the byte gets written to the output file.

To further enhance the translation function and minimize the use of command-line arguments, I modified the MVS and PC versions of LZHUF to indicate the origin of the data file by writing either a 1Eh or 1Fh, respectively, as the first byte of the encoded file. Thus, this first header byte is verified before decoding and if the encoded file origin is different from the decoding environment, translation occurs accordingly; see Listing Two (page 98).

The values 1Eh and 1Fh were chosen because they are the same in ASCII and EDCDIC. Also, 1Eh in ASCII represents an arrow pointing upward (MVS) and 1Fh points downward (MS-DOS). Interestingly enough, 18h and 19h could have been used for the exact same reasons.

Because of the change in the encoded file header, LZHUF was renamed LZH. Thus, the LZH-encoded file has a header in which the first byte designates origin and the next four bytes are an unsigned long integer representing the text size.

Handling MVS and MS-DOS Files

MVS file allocation and handling is special. Unlike MS-DOS and UNIX, the unit of file processing on MVS is a record, not a character. This makes relative access difficult to attain. In MS-DOS and UNIX, you can move a file pointer back and forth with ease. In MVS, you can move a file pointer, but with certain restrictions and only on files allocated with particular record formats. To use standard C functions such as fseek and rewind, your file must be allocated with a record-format fixed-block sequential (FBS). However, most sites predominantly use other record formats such as fixed block (FB) and variable block (V), which do not support relative access. Even if FBS is highly recommended as an efficient record format, changing existing practice is problematic. Currently, IBM is rewriting its mainframe operating systems in C, which may increase usage of record formats that support relative file access. Until then, some changes to LZHUF.C are needed to support various types of record format, by removing functions requiring relative access.

In LZH, the relative access functions fseek and rewind are used to determine text size before encoding (fseek reaches the end of file and returns the file size, while rewind rewinds.) Alternatively, MVS LZH goes through the whole file and counts each byte, then closes and reopens the file as shown in Listing Three (page 98). In the original LZHUF, all files are open and closed in the main function using command-line arguments and relative file access is performed in the function Encode(). Because MVS LZH closes and reopens the data file, the procedure to determine text size was moved into main. The alternative is to write some Job Control Language or CList that creates a temporary file allocated with a record format of FBS and copy the original data into it, leaving LZH to encode the temporary file. But this option consumes too much system overhead when compared to the simple changes I made to MVS LZH. On DOS and UNIX, LZH should maintain the relative access functions.

Okumura's AR.C, as chosen by Edwards, makes extensive use of relative access methods including fwrite and fread. Applying this program as is to MVS would require a standardized usage of the file record format FBS--a change justified by the benefits.

Under MVS, the LZH-encoded file should be allocated with record format variable since the size of the encoded file is not determined until compression is completed. If the record format is fixed, MVS by default returns an error for the last incomplete record and pads the record. This isn't a critical error since LZH decodes every byte until it reaches the original text-file size specified in the encoded file header. What is critical here is decoding to the right record size.

When encoding an DOS file, each byte is processed including carriage return (CR) and linefeed (LF). These two sequential bytes are significant to DOS since they established the end of a record and allow the length to vary. In MVS, you can decode to a variable-block record format and translate and write all DOS-legible characters to that file. However, on MVS, the proper way to decode a file encoded under DOS is to allocate the MVS file with record format fixed, preferably fixed-block sequential, complying to the largest record length of the DOS data file. LZH must then be informed by a command-line argument of the length of that record. LZH will refer to this value whenever it encounters a CRLF sequence and pad accordingly as shown in Listing Four (page 98). This will be different for UNIX-encoded files and should be modified as needed. Moreover, when decoding files from other environments, control and extended characters are often irrelevant. My experiments in MVS concluded that these characters should not be translated and written to the decoded output data set, as they often disrupt simple file-manipulation tasks such as browsing or editing. If the record-length command-line argument is omitted, then MVS LZH will, by default, decode according to record-format size specified during file allocation. Finally, before closing the MVS data file, the Decode() function pushes a CRLF sequence through Translate(), allowing a verification of the line position at the DOS end of file and performing any required padding.

Since MVS provides no characters to indicate the end of a record, decoding MVS LZH compressed file in DOS requires knowledge of the record length, as given by the command-line argument, in order to insert a CRLF at the end of each record of the DOS output file. However, if the command-line argument for the record length is not specified, then the CRLF insertion will be omitted, and the DOS output file will be a data stream of fixed-size records with no end-of-record marker. LZH DOS Translate() version is shown in Listing Five (page 98). For practical purposes, record size could be noted in encoded filenames.

The complete C source code for MVS LZH and DOS LZH is available electronically; see "Availability," page 3.

LZH: An Applied Solution

LZH for MS-DOS and MVS is being used by the Alberta Government for bulk data transfers in health-care billing. In this context, compressed data transferred between private operators and the government provides transmission-cost savings. In addition, data compression plays a crucial role in capacity planning. A compression factor of 80 percent or better relaxes serious constraints associated with transfer schedules and bulk data transfers crowding the allotted time frame. For example, in the same transfer window, data compression can offer sites the ability to increase data bulk and implement procedures to resend on error; alternatively, the government can revise the schedule to absorb growth in the number of participating sites. LZH is also made available for usage within Alberta government departments, where other potential savings can be exploited.

The richness and abundance of system utilities documented in portable C source files should provide enough justification to promote implementation and usage of C development tools on every platform. Upon critical review of this worldwide C source library, it would be very easy to find many utilities that can generate savings of various kinds once they're adapted to serve a purpose. The LZH cross-platform compression project is an example of this potential. However, my personal experience indicates that C is under-utilized in IBM mainframe environments. Opening access to C tools would invite a greater following of C programmers and a related proliferation of development. In the spirit of open systems, this would represent a significant contribution.

References

IBM PC 3270 Terminal Emulation Software V1.21 User's Guide (1989). Austin, Texas: IBM.

Nelson, Mark. The Data Compression Book. San Mateo, CA: M&T Books, 1991.

Plauger P.J. and Jim Brodie. Standard C: Programmer's Quick Reference Series. Redmond, WA: Microsoft Press, (1989).



[LISTING ONE]

          /* ascii to ebcdic code page 437 - MVS version */
unsigned char ebcdic[256] = {
       0x00, 0x01, 0x02, 0x03, 0x37, 0x2D, 0x2E, 0x2F,
       0x16, 0x05, 0x25, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
       ...
       0xEE, 0xEF, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF
   };
          /* ebcdic to ascii code page 437 - DOS version */
unsigned char ascii[256] = {
       0x00, 0x01, 0x02, 0x03, 0x9C, 0x09, 0x86, 0x7F,
       0x97, 0x8D, 0x8E, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
       ...
       0x38, 0x39, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF
   };
static void Translate( unsigned char c)
{
 ... for MVS version
     putc(ebcdic[c],outfile);
     /* write corresponding ebcdic character to output file by
        referring to subscipt c (ascii char) in ebcdic table */
 ... for DOS version
     putc(ascii[c],outfile);
     /* write corresponding ascii character to output file by
        referring to subscipt c (ebcdic char) in ascii table */
 ...
}



[LISTING TWO]
<a name="0369_000b">

unsigned char origin = 0x1E; /* initializing to MVS origin */
                          /* ascii to ebcdic code page 437 */
unsigned char ebcdic[256] = {
       0x00, 0x01, 0x02, 0x03, 0x37, 0x2D, 0x2E, 0x2F,
       0x16, 0x05, 0x25, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
       ...
       0xEE, 0xEF, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF
   };
/* character translation */
static void Translate( unsigned short c )
{
 ...
    if ( origin == 0x1F ) {
        /* DOS file origin decoding on MVS
           convert ascii to ebcdic from table */
        if ( putc( ebcdic[c], outfile ) == EOF ) Error(wterr);
        ...
    } else {
        /* MVS file origin decoding on MVS */
        if ( putc( c, outfile ) == EOF ) Error(wterr);
    }
}
/* compress */
static void Encode( void )
{
 ...
    /* write first byte to compressed file */
    fputc(origin,outfile);
 ...
}
/* decode */
static void Decode( void )
{
    unsigned short c;
    ...
       Translate(c);
    ...
}
short main( short argc, char *argv[] )
{
  ...
    if (toupper(*argv[1]) == 'E') {
        ...
        Encode();
    } else {
        ...
        /* read first byte of compressed file
            and verify MVS or PC DOS origin */
        origin = fgetc(infile);
        if ( origin == 0x1E | origin == 0x1F )
             Decode();
        ...
    }
  ...
}


<a name="0369_000c"><a name="0369_000d">
[LISTING THREE]
<a name="0369_000d">

Original relative file access used in LZHUF function Encode()

        fseek(infile, 0L, 2);     /* go to end of file */
        textsize = ftell(infile); /* get file size */
        rewind(infile);           /* go to beginning of file */


Determining text size without relative access.
Procedure moved to the main function of MVS LZH.

        /* count bytes to textsize */
        while ( getc( infile ) != EOF ) textsize++;
        /* close date file */
        fclose( infile );
        /* re-open data file, position at beginning of file */
        if (s = argv[2], (infile = fopen(s, "rb")) == NULL) {
              printf("Cannot re-open %s\n", s);
              return EXIT_FAILURE;
        }
        Encode();




<a name="0369_000e"><a name="0369_000f">
[LISTING FOUR]
<a name="0369_000f">

unsigned long llen = 0;  /* initializing record line length */
 /* translate ascii to ebcdic and pad record length  */
static void Translate(unsigned short c)
{
    unsigned short l = 0;
    static unsigned short lpos = 0, b = 0x40;
    if ( origin == 0x1F ) {  /* DOS origin - MVS decoding */
        if ( b == 0x0D && c == 0x0A && lpos > 0 ) {
            /* 0x0D CR and 0x0A LF indicating DOS end of record
               with line position (lpos) active */
            for ( l = lpos; l < llen; l++)
                 /* pad with EBCDIC spaces */
                 if ( putc(0x40, outfile ) == EOF ) Error(wterr);
            lpos = 0; /* reset line position */
        }
        if ( c > 0x1F ) {  /* skip DOS control characters */
            /* translate ascii to ebcdic */
            if ( putc(ebcdic[c], outfile ) == EOF ) Error(wterr);
            /* keep track of line position if required */
            if ( llen > 0 ) lpos++ ;
        }
    } else { /* MVS origin - MVS decoding */
        if ( putc(c, outfile ) == EOF ) Error(wterr);
    }
    b = c; /* since CR LF is sequential, b must retain c */
}
short main( short argc, char *argv[] )
{
  ...
        /* convert argument to llen*/
        if (argv[4] != NULL) llen = atol(argv[4]);
  ...
}



<a name="0369_0010"><a name="0369_0011">
[LISTING FIVE]
<a name="0369_0011">

unsigned long llen = 0;  /* initializing record line length */
/* translate ebcdic to ascii and control record length */
static void Translate( unsigned short c )
{
    static unsigned short lpos = 0;
    if ( origin == 0x1E ) {   /* MVS origin - DOS decoding */
         /* convert ebcdic to ascii from table */
        if ( putc( ascii[c], outfile ) == EOF ) Error(wterr);
         /* perform record length control as requested */
        if (llen > 0) {
             if (lpos == llen) {
                  /* insert end of record DOS CR LF */
                 if ( ( putc( 0x0D, outfile ) == EOF ) ||
                      ( putc( 0x0A, outfile ) == EOF ) ) Error(wterr);
                  /* reset line position */
                 lpos = 0;
             } else {
                  /* keep track of line position */
                 lpos++;
             }
        }
    } else {                /* DOS origin - MVS decoding */
        if ( putc( c, outfile ) == EOF ) Error(wterr);
    }
}
short main( short argc, char *argv[] )
{
  ...
                      /* convert argument to llen*/
        if (argv[4] != NULL) llen = atol(argv[4]);
  ...
}


<a name="0369_0012"><a name="0369_0013">
[LISTING 6]
 MVS LZH<a name="0369_0013">

/**************************************************************
    MVS LZH based on:
    lzhuf.c
    written by Haruyasu Yoshizaki 1988/11/20
    some minor changes 1989/04/06
    comments translated by Haruhiko Okumura 1989/04/07
    getbit and getbyte modified 1990/03/23 by Paul Edwards
      so that they would work on machines where integers are
      not necessarily 16 bits (although ANSI guarantees a
      minimum of 16).  This program has compiled and run with
      no errors under Turbo C 2.0, Power C, and SAS/C 4.5
      (running on an IBM mainframe under MVS/XA 2.2).  Could
      people please use YYYY/MM/DD date format so that everyone
      in the world can know what format the date is in?
    external storage of filesize changed 1990/04/18 by Paul Edwards to
      Intel's "little endian" rather than a machine-dependant style so
      that files produced on one machine with lzhuf can be decoded on
      any other.  "little endian" style was chosen since lzhuf
      originated on PC's, and therefore they should dictate the
      standard.
    initialization of something predicting spaces changed 1990/04/22 by
      Paul Edwards so that when the compressed file is taken somewhere
      else, it will decode properly, without changing ascii spaces to
      ebcdic spaces.  This was done by changing the ' ' (space literal)
      to 0x20 (which is the far most likely character to occur, if you
      don't know what environment it will be running on.
    storage of filesize modified 1990/06/02 by Mark Nelson.
      When reading in the file size, I was getting sign extension
      when reading in bytes greater than or equal to 0x80, which
      messed everything up.
    MVS version modified 1993/03/05 by Pierre Dion.
      Added Translate() for ASCII to EBCDIC translation and padding
      to record size as defined by arguments. Also, 0x0D and 0x0A
      (DOS CR LF) and other DOS control characters are held back.
      Encoded file header modified to designate origin 0x1E for
      MVS and 0x1F for DOS.
**************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

FILE  *infile, *outfile;
static unsigned long  textsize = 0, codesize = 0, printcount = 0;
char wterr[] = "Can't write.";

unsigned long llen = 0;
unsigned char origin = 0x1E; /* initializing MVS origin */

                              /* ascii to ebcdic code page 437 */
unsigned char ebcdic[256] = {
       0x00, 0x01, 0x02, 0x03, 0x37, 0x2D, 0x2E, 0x2F,
       0x16, 0x05, 0x25, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
       0x10, 0x11, 0x12, 0x13, 0x3C, 0x3D, 0x32, 0x26,
       0x18, 0x19, 0x3F, 0x27, 0x1C, 0x1D, 0x1E, 0x1F,
       0x40, 0x5A, 0x7F, 0x7B, 0x5B, 0x6C, 0x50, 0x7D,
       0x4D, 0x5D, 0x5C, 0x4E, 0x6B, 0x60, 0x4B, 0x61,
       0xF0, 0xF1, 0xF2, 0xF3, 0xF4, 0xF5, 0xF6, 0xF7,
       0xF8, 0xF9, 0x7A, 0x5E, 0x4C, 0x7E, 0x6E, 0x6F,
       0x7C, 0xC1, 0xC2, 0xC3, 0xC4, 0xC5, 0xC6, 0xC7,
       0xC8, 0xC9, 0xD1, 0xD2, 0xD3, 0xD4, 0xD5, 0xD6,
       0xD7, 0xD8, 0xD9, 0xE2, 0xE3, 0xE4, 0xE5, 0xE6,
       0xE7, 0xE8, 0xE9, 0x4A, 0xE0, 0x4F, 0x5F, 0x6D,
       0x79, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
       0x88, 0x89, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96,
       0x97, 0x98, 0x99, 0xA2, 0xA3, 0xA4, 0xA5, 0xA6,
       0xA7, 0xA8, 0xA9, 0xC0, 0x6A, 0xD0, 0xA1, 0x07,
       0x20, 0x21, 0x22, 0x23, 0x24, 0x15, 0x06, 0x17,
       0x28, 0x29, 0x2A, 0x2B, 0x2C, 0x09, 0x0A, 0x1B,
       0x30, 0x31, 0x1A, 0x33, 0x34, 0x35, 0x36, 0x08,
       0x38, 0x39, 0x3A, 0x3B, 0x04, 0x14, 0x3E, 0xE1,
       0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47, 0x48,
       0x49, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57,
       0x58, 0x59, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67,
       0x68, 0x69, 0x70, 0x71, 0x72, 0x73, 0x74, 0x75,
       0x76, 0x77, 0x78, 0x80, 0x8A, 0x8B, 0x8C, 0x8D,
       0x8E, 0x8F, 0x90, 0x9A, 0x9B, 0x9C, 0x9D, 0x9E,
       0x9F, 0xA0, 0xAA, 0xAB, 0xAC, 0xAD, 0xAE, 0xAF,
       0xB0, 0xB1, 0xB2, 0xB3, 0xB4, 0xB5, 0xB6, 0xB7,
       0xB8, 0xB9, 0xBA, 0xBB, 0xBC, 0xBD, 0xBE, 0xBF,
       0xCA, 0xCB, 0xCC, 0xCD, 0xCE, 0xCF, 0xDA, 0xDB,
       0xDC, 0xDD, 0xDE, 0xDF, 0xEA, 0xEB, 0xEC, 0xED,
       0xEE, 0xEF, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF
   };


/***************  Start of LZHUF unchanged code section ***************/


static void Error(char *message)
{
    fprintf(stderr,"\n%s\n", message);
    exit(EXIT_FAILURE);
}

/********** LZSS compression **********/

#define N       4096    /* buffer size */
#define F       15  /* lookahead buffer size */
#define THRESHOLD   2
#define NIL     N   /* leaf of tree */

unsigned char  text_buf[N + F - 1];

static unsigned short  match_position, match_length,
                       lson[N + 1], rson[N + 257], dad[N + 1];

static void InitTree(void)  /* initialize trees */
{
    short i;

    for (i = N + 1; i <= N + 256; i++)
        rson[i] = NIL;        /* root */
    for (i = 0; i < N; i++)
        dad[i] = NIL;         /* node */
}

static void InsertNode(short r)  /* insert to tree */
{
    short  i, p, cmp;
    unsigned char  *key;
    unsigned short c;

    cmp = 1;
    key = &text_buf[r];
    p = N + 1 + key[0];
    rson[r] = lson[r] = NIL;
    match_length = 0;
    for ( ; ; ) {
        if (cmp >= 0) {
            if (rson[p] != NIL)
                p = rson[p];
            else {
                rson[p] = r;
                dad[r] = p;
                return;
            }
        } else {
            if (lson[p] != NIL)
                p = lson[p];
            else {
                lson[p] = r;
                dad[r] = p;
                return;
            }
        }
        for (i = 1; i < F; i++)
            if ((cmp = key[i] - text_buf[p + i]) != 0)
                break;
        if (i > THRESHOLD) {
            if (i > match_length) {
                match_position = ((r - p) & (N - 1)) - 1;
                if ((match_length = i) >= F)
                    break;
            }
            if (i == match_length) {
                if ((c = ((r - p) & (N - 1)) - 1) < match_position) {
                    match_position = c;
                }
            }
        }
    }
    dad[r] = dad[p];
    lson[r] = lson[p];
    rson[r] = rson[p];
    dad[lson[p]] = r;
    dad[rson[p]] = r;
    if (rson[dad[p]] == p)
        rson[dad[p]] = r;
    else
        lson[dad[p]] = r;
    dad[p] = NIL; /* remove p */
}

static void DeleteNode(short p)  /* remove from tree */
{
    short  q;

    if (dad[p] == NIL)
        return;         /* not registered */
    if (rson[p] == NIL)
        q = lson[p];
    else
    if (lson[p] == NIL)
        q = rson[p];
    else {
        q = lson[p];
        if (rson[q] != NIL) {
            do {
                q = rson[q];
            } while (rson[q] != NIL);
            rson[dad[q]] = lson[q];
            dad[lson[q]] = dad[q];
            lson[q] = lson[p];
            dad[lson[p]] = q;
        }
        rson[q] = rson[p];
        dad[rson[p]] = q;
    }
    dad[q] = dad[p];
    if (rson[dad[p]] == p)
        rson[dad[p]] = q;
    else
        lson[dad[p]] = q;
    dad[p] = NIL;
}

/* Huffman coding */

#define N_CHAR      (256 - THRESHOLD + F)
                /* kinds of characters (character code = 0..N_CHAR-1) */
#define T       (N_CHAR * 2 - 1)    /* size of table */
#define R       (T - 1)         /* position of root */
#define MAX_FREQ    0x8000      /* updates tree when the */
                    /* root frequency comes to this value. */

typedef unsigned char uchar;


/* table for encoding and decoding the upper 6 bits of position */

/* for encoding */
uchar p_len[64] = {
    0x03, 0x04, 0x04, 0x04, 0x05, 0x05, 0x05, 0x05,
    0x05, 0x05, 0x05, 0x05, 0x06, 0x06, 0x06, 0x06,
    0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
    0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
    0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
    0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
    0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
    0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08
};
uchar p_code[64] = {
    0x00, 0x20, 0x30, 0x40, 0x50, 0x58, 0x60, 0x68,
    0x70, 0x78, 0x80, 0x88, 0x90, 0x94, 0x98, 0x9C,
    0xA0, 0xA4, 0xA8, 0xAC, 0xB0, 0xB4, 0xB8, 0xBC,
    0xC0, 0xC2, 0xC4, 0xC6, 0xC8, 0xCA, 0xCC, 0xCE,
    0xD0, 0xD2, 0xD4, 0xD6, 0xD8, 0xDA, 0xDC, 0xDE,
    0xE0, 0xE2, 0xE4, 0xE6, 0xE8, 0xEA, 0xEC, 0xEE,
    0xF0, 0xF1, 0xF2, 0xF3, 0xF4, 0xF5, 0xF6, 0xF7,
    0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF
};

/* for decoding */
uchar d_code[256] = {
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
    0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01,
    0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02,
    0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02,
    0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
    0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
    0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
    0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
    0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
    0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
    0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
    0x09, 0x09, 0x09, 0x09, 0x09, 0x09, 0x09, 0x09,
    0x0A, 0x0A, 0x0A, 0x0A, 0x0A, 0x0A, 0x0A, 0x0A,
    0x0B, 0x0B, 0x0B, 0x0B, 0x0B, 0x0B, 0x0B, 0x0B,
    0x0C, 0x0C, 0x0C, 0x0C, 0x0D, 0x0D, 0x0D, 0x0D,
    0x0E, 0x0E, 0x0E, 0x0E, 0x0F, 0x0F, 0x0F, 0x0F,
    0x10, 0x10, 0x10, 0x10, 0x11, 0x11, 0x11, 0x11,
    0x12, 0x12, 0x12, 0x12, 0x13, 0x13, 0x13, 0x13,
    0x14, 0x14, 0x14, 0x14, 0x15, 0x15, 0x15, 0x15,
    0x16, 0x16, 0x16, 0x16, 0x17, 0x17, 0x17, 0x17,
    0x18, 0x18, 0x19, 0x19, 0x1A, 0x1A, 0x1B, 0x1B,
    0x1C, 0x1C, 0x1D, 0x1D, 0x1E, 0x1E, 0x1F, 0x1F,
    0x20, 0x20, 0x21, 0x21, 0x22, 0x22, 0x23, 0x23,
    0x24, 0x24, 0x25, 0x25, 0x26, 0x26, 0x27, 0x27,
    0x28, 0x28, 0x29, 0x29, 0x2A, 0x2A, 0x2B, 0x2B,
    0x2C, 0x2C, 0x2D, 0x2D, 0x2E, 0x2E, 0x2F, 0x2F,
    0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
    0x38, 0x39, 0x3A, 0x3B, 0x3C, 0x3D, 0x3E, 0x3F,
};

uchar d_len[256] = {
    0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
    0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
    0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
    0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03, 0x03,
    0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
    0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
    0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
    0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
    0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
    0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04, 0x04,
    0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
    0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
    0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
    0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
    0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
    0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
    0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
    0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05, 0x05,
    0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
    0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
    0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
    0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
    0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
    0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06, 0x06,
    0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
    0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
    0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
    0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
    0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
    0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07, 0x07,
    0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
    0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08, 0x08,
};

unsigned short freq[T + 1]; /* frequency table */

short prnt[T + N_CHAR];/* pointers to parent nodes, except for the */
            /* elements [T..T + N_CHAR - 1] which are used to get */
            /* the positions of leaves corresponding to the codes. */


short son[T];   /* pointers to child nodes (son[], son[] + 1) */

unsigned short getbuf = 0;
uchar getlen = 0;

static short GetBit(void)    /* get one bit */
{
    unsigned short i;

    while (getlen <= 8) {
        if ((short)(i = getc(infile)) < 0) i = 0;
        getbuf |= i << (8 - getlen);
        getlen += 8;
    }
    i = getbuf;
    getbuf <<= 1;
    getlen--;
    return (short)((i & 0x8000) >> 15);
}

static short GetByte(void)   /* get one byte */
{
    unsigned short i;

    while (getlen <= 8) {
        if ((short)(i = getc(infile)) < 0) i = 0;
        getbuf |= i << (8 - getlen);
        getlen += 8;
    }
    i = getbuf;
    getbuf <<= 8;
    getlen -= 8;
    return (short)((i & 0xff00) >> 8);
}

unsigned short putbuf = 0;
uchar putlen = 0;


/* output c bits of code */

static void Putcode(short l, unsigned long c)
{
    putbuf |= c >> putlen;
    if ((putlen += l) >= 8) {
        if (putc(putbuf >> 8, outfile) == EOF) {

            Error(wterr);
        }
        if ((putlen -= 8) >= 8) {
            if (putc(putbuf, outfile) == EOF) {
               Error(wterr);
            }
            codesize += 2;
            putlen -= 8;
            putbuf = c << (l - putlen);
        } else {
            putbuf <<= 8;
            codesize++;
        }
    }
}


/* initialization of tree */

static void StartHuff(void)
{
    short i, j;

    for (i = 0; i < N_CHAR; i++) {
        freq[i] = 1;
        son[i] = i + T;
        prnt[i + T] = i;
    }
    i = 0; j = N_CHAR;
    while (j <= R) {
        freq[j] = freq[i] + freq[i + 1];
        son[j] = i;
        prnt[i] = prnt[i + 1] = j;
        i += 2; j++;
    }
    freq[T] = 0xffff;
    prnt[R] = 0;
}


/* reconstruction of tree */

static void reconst(void)
{
    short i, j, k;
    unsigned short f, l;

    /* collect leaf nodes in the first half of the table */
    /* and replace the freq by (freq + 1) / 2. */
    j = 0;
    for (i = 0; i < T; i++) {
        if (son[i] >= T) {
            freq[j] = (freq[i] + 1) / 2;
            son[j] = son[i];
            j++;
        }
    }
    /* begin constructing tree by connecting sons */
    for (i = 0, j = N_CHAR; j < T; i += 2, j++) {
        k = i + 1;
        f = freq[j] = freq[i] + freq[k];
        for (k = j - 1; f < freq[k]; k--);
        k++;
        l = (j - k) * 2;
        memmove(&freq[k + 1], &freq[k], l);
        freq[k] = f;
        memmove(&son[k + 1], &son[k], l);
        son[k] = i;
    }
    /* connect prnt */
    for (i = 0; i < T; i++) {
        if ( (k = son[i]) >= T ) {
             prnt[k] = i;
        } else {
             prnt[k] = prnt[ k + 1 ] = i;
        }
    }
}


/* increment frequency of given code by one, and update tree */
static void update(unsigned short c)
{
    short i, j, k, l;

    if (freq[R] == MAX_FREQ) {
        reconst();
    }
    c = prnt[c + T];
    do {
        k = ++freq[c];

        /* if the order is disturbed, exchange nodes */
        if (k > freq[l = c + 1]) {
            while (k > freq[++l]);
            l--;
            freq[c] = freq[l];
            freq[l] = k;

            i = son[c];
            prnt[i] = l;
            if (i < T) prnt[i + 1] = l;

            j = son[l];
            son[l] = i;

            prnt[j] = c;
            if (j < T) prnt[j + 1] = c;
            son[c] = j;

            c = l;
        }
    } while ((c = prnt[c]) != 0); /* repeat up to root */
}
unsigned short code, len;

static void EncodeChar(unsigned short c)
{
    unsigned short i;
    short j, k;
    i = 0;
    j = 0;
    k = prnt[c + T];

    /* travel from leaf to root */
    do {
        i >>= 1;

        /* if node's address is odd-numbered, choose bigger brother node */
        if (k & 1) i += 0x8000;

        j++;
    } while ((k = prnt[k]) != R);
    Putcode(j, i);
    code = i;
    len = j;
    update(c);
}

static void EncodePosition(unsigned short c)
{
    unsigned short i;

    /* output upper 6 bits by table lookup */
    i = c >> 6;
    Putcode((short) p_len[i],(unsigned long) p_code[i] << 8);
    /* output lower 6 bits verbatim */
    Putcode(6, (c & 0x3f) << 10);
}

static void EncodeEnd(void)
{
    if (putlen) {
        if (putc(putbuf >> 8, outfile) == EOF) {
            Error(wterr);
        }
        codesize++;
    }
}

static short DecodeChar(void)
{
   unsigned short c;

    c = son[R];

    /* travel from root to leaf, */
    /* choosing the smaller child node (son[]) if the read bit is 0, */
    /* the bigger (son[]+1) if 1 */
    while (c < T) {
        c += GetBit();
        c = son[c];

    }
    c -= T;
    update(c);
    return (short)c;
}

static short DecodePosition(void)
{
    unsigned short i, j, c;

    /* recover upper 6 bits from table */
    i = GetByte();
    c = (unsigned short)d_code[i] << 6;
    j = d_len[i];

    /* read lower 6 bits verbatim */
    j -= 2;
    while (j--) {
        i = (i << 1) + GetBit();
    }
    return (short)(c | (i & 0x3f));
}

/******************* End of LZHUF unchanged code section ***************/


/* translate ascii to ebcdic and pad record length  */

static void Translate(unsigned short c)
{
    unsigned short l = 0;
    static unsigned short lpos = 0, b = 0x40;

    if ( origin == 0x1F ) {  /* DOS origin - MVS decoding */
        if ( b == 0x0D && c == 0x0A && lpos > 0 ) {
            /* 0x0D CR and 0x0A LF indicating DOS end of record
               with line position (lpos) active */
            for ( l = lpos; l < llen; l++)
                 /* pad with EBCDIC spaces */
                 if ( putc(0x40, outfile ) == EOF ) Error(wterr);
            lpos = 0; /* reset line position */
        }
        if ( c > 0x1F ) {  /* skip DOS control characters */
            /* translate ascii to ebcdic */
            if ( putc(ebcdic[c], outfile ) == EOF ) Error(wterr);
            /* keep track of line position if required */
            if ( llen > 0 ) lpos++ ;
        }
    } else { /* MVS origin - MVS decoding */
        if ( putc(c, outfile ) == EOF ) Error(wterr);
    }
    b = c; /* since CR LF is sequential, b must retain c */
}


/* compression */

static void Encode(void)
{
    short  i, c, len, r, s, last_match_length;

    fputc(0x1E,outfile); /* designate MVS origin */

    fputc((short)((textsize & 0xff000000L) >> 24),outfile);
    fputc((short)((textsize & 0xff0000L) >> 16),outfile);
    fputc((short)((textsize & 0xff00) >> 8),outfile);
    fputc((short)((textsize & 0xff)),outfile);
    if (ferror(outfile))
        Error(wterr);   /* output size of text */
    if (textsize == 0)
        return;

    printf("In : %ld bytes\n", textsize);
    textsize = 0;           /* rewind and re-read */
    StartHuff();
    InitTree();
    s = 0;
    r = N - F;
    for (i = s; i < r; i++)
        text_buf[i] = 0x20;
    for (len = 0; len < F && (c = getc(infile)) != EOF; len++)
        text_buf[r + len] = c;
    textsize = len;
    for (i = 1; i <= F; i++)
        InsertNode((short) (r - i));
    InsertNode(r);
    do {
        if (match_length > len)
            match_length = len;
        if (match_length <= THRESHOLD) {
            match_length = 1;
            EncodeChar((unsigned short) text_buf[r]);
        } else {
         EncodeChar((unsigned short) (255 - THRESHOLD + match_length));
            EncodePosition(match_position);
        }
        last_match_length = match_length;
        for (i = 0; i < last_match_length &&
                (c = getc(infile)) != EOF; i++) {
            DeleteNode(s);
            text_buf[s] = c;
            if (s < F - 1)
                text_buf[s + N] = c;
            s = (s + 1) & (N - 1);
            r = (r + 1) & (N - 1);
            InsertNode(r);
        }
        if ((textsize += i) > printcount) {
            printcount += 4096;
        }
        while (i++ < last_match_length) {
            DeleteNode(s);
            s = (s + 1) & (N - 1);
            r = (r + 1) & (N - 1);
            if (--len) InsertNode(r);
        }
    } while (len > 0);
    EncodeEnd();
    printf("\nOut: %ld bytes\n", codesize);
    printf("Out/In: %.3f\n", (double)codesize / textsize);
}

static void Decode(void)  /* recover */
{
    short  i, j, k, r, c;
    unsigned long count;

    textsize = (unsigned char) fgetc(infile);
    textsize <<= 8;
    textsize |= (unsigned char) fgetc(infile);
    textsize <<= 8;
    textsize |= (unsigned char) fgetc(infile);
    textsize <<= 8;
    textsize |= (unsigned char) fgetc(infile);
    if (ferror(infile))
        Error("Can't read");  /* read size of text */
    if (textsize == 0)
        return;
    printf("Original Text size = %ld\n", textsize );
    StartHuff();
    for (i = 0; i < N - F; i++)
        text_buf[i] = 0x20;
    r = N - F;
    for (count = 0; count < textsize; ) {
        c = DecodeChar();
        if (c < 256) {
            Translate(c);
            text_buf[r++] = c;
            r &= (N - 1);
            count++;
        } else {
            i = (r - DecodePosition() - 1) & (N - 1);
            j = c - 255 + THRESHOLD;
            for (k = 0; k < j; k++) {
                c = text_buf[(i + k) & (N - 1)];
                Translate(c);
                text_buf[r++] = c;
                r &= (N - 1);
                count++;
            }
        }
        if (count > printcount) {
            printcount += 4096;
        }
    }
    /* forcing DOS CR LF on Translate ensures proper closing of MVS dataset */
    if ( origin == 0x1F ) {
           Translate(0x0D);
           Translate(0x0A);
    }
    printf("\nRestored\n");
}

short main(short argc, char *argv[])
{
    char  *s;

    printf("\nLZH 1.02 - Multi-Platform Compression/Decoding,\n"
       " Based on LZHUF written by Haruyasu Yoshizaki 1988 (Japan),\n"
       "               modified by Paul Edwards 1990 (Australia),\n"
       "                       and Mark Nelson 1990 (USA),\n"
       " IBM/MVS<>PC/DOS by Pierre Dion 1993 (Canada),\n");

    if (argc < 4) {
        printf("'lzh e file1 file2'   encodes file1 into file2.\n"
               "'lzh d file2 file1 l' decodes file2 into file1.\n"
               "    'l'  specifies record length,\n"
               "         LZH will pad original ascii file to record length\n");
        return EXIT_FAILURE;
    }
    if ((s = argv[1], s[1] || strpbrk(s, "DEde") == NULL)
     || (s = argv[2], (infile = fopen(s, "rb")) == NULL)
     || (s = argv[3], (outfile = fopen(s, "wb")) == NULL)) {
        printf("??? %s\n", s);
        return EXIT_FAILURE;
    }
    if (toupper(*argv[1]) == 'E') {   /* a work around relative file access */
        while ( getc( infile ) != EOF ) textsize++;
        fclose( infile );
        if (s = argv[2], (infile = fopen(s, "rb")) == NULL) {
              printf("Cannot re-open %s\n", s);
              return EXIT_FAILURE;
        }
        Encode();
    } else {
        if (argv[4] != NULL) llen = atol(argv[4]);
        origin = fgetc(infile);                 /* determine file origin */
        if ( origin == 0x1E || origin == 0x1F ) {   /* MVS or PC DOS */
             Decode();
        } else
             printf("\nError: %s is not a LZH compressed file\n",argv[2]);
    }
    fclose(infile);
    fclose(outfile);
    return EXIT_SUCCESS;
}


<a name="0369_0014"><a name="0369_0015">
[LISTING 7]
 DOS LZH<a name="0369_0015">

/**********************************************************************
    DOS LZH based on lzhuf.c
    written by Haruyasu Yoshizaki 1988/11/20
    see MVS LZH listing for comments
    DOS version modified 1993/03/05 by Pierre Dion.
      Added Translate() for ASCII to EBCDIC translation and inserting
      0x0D and 0x0A (DOS CR LF) for decoding MVS record format.
      Encoded file header modified to designate origin 0x1E for
      MVS and 0x1F for DOS.
***********************************************************************/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

FILE  *infile, *outfile;
static unsigned long int  textsize = 0, codesize = 0, printcount = 0;
char wterr[] = "Can't write.";

unsigned long int llen = 0;
unsigned char origin = 0x1F;  /* ascii origin  */
            /* ebcdic to ascii translation code page 437 */
unsigned char ascii[256] = {
           0x00, 0x01, 0x02, 0x03, 0x9C, 0x09, 0x86, 0x7F,
           0x97, 0x8D, 0x8E, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
           0x10, 0x11, 0x12, 0x13, 0x9D, 0x85, 0x08, 0x87,
           0x18, 0x19, 0x92, 0x8F, 0x1C, 0x1D, 0x1E, 0x1F,
           0x80, 0x81, 0x82, 0x83, 0x84, 0x0A, 0x17, 0x1B,
           0x88, 0x89, 0x8A, 0x8B, 0x8C, 0x05, 0x06, 0x07,
           0x90, 0x91, 0x16, 0x93, 0x94, 0x95, 0x96, 0x04,
           0x98, 0x99, 0x9A, 0x9B, 0x14, 0x15, 0x9E, 0x1A,
           0x20, 0xA0, 0xA1, 0xA2, 0xA3, 0xA4, 0xA5, 0xA6,
           0xA7, 0xA8, 0x5B, 0x2E, 0x3C, 0x28, 0x2B, 0x5D,
           0x26, 0xA9, 0xAA, 0xAB, 0xAC, 0xAD, 0xAE, 0xAF,
           0xB0, 0xB1, 0x21, 0x24, 0x2A, 0x29, 0x3B, 0x5E,
           0x2D, 0x2F, 0xB2, 0xB3, 0xB4, 0xB5, 0xB6, 0xB7,
           0xB8, 0xB9, 0x7C, 0x2C, 0x25, 0x5F, 0x3E, 0x3F,
           0xBA, 0xBB, 0xBC, 0xBD, 0xBE, 0xBF, 0xC0, 0xC1,
           0xC2, 0x60, 0x3A, 0x23, 0x40, 0x27, 0x3D, 0x22,
           0xC3, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67,
           0x68, 0x69, 0xC4, 0xC5, 0xC6, 0xC7, 0xC8, 0xC9,
           0xCA, 0x6A, 0x6B, 0x6C, 0x6D, 0x6E, 0x6F, 0x70,
           0x71, 0x72, 0xCB, 0xCC, 0xCD, 0xCE, 0xCF, 0xD0,
           0xD1, 0x7E, 0x73, 0x74, 0x75, 0x76, 0x77, 0x78,
           0x79, 0x7A, 0xD2, 0xD3, 0xD4, 0xD5, 0xD6, 0xD7,
           0xD8, 0xD9, 0xDA, 0xDB, 0xDC, 0xDD, 0xDE, 0xDF,
           0xE0, 0xE1, 0xE2, 0xE3, 0xE4, 0xE5, 0xE6, 0xE7,
           0x7B, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47,
           0x48, 0x49, 0xE8, 0xE9, 0xEA, 0xEB, 0xEC, 0xED,
           0x7D, 0x4A, 0x4B, 0x4C, 0x4D, 0x4E, 0x4F, 0x50,
           0x51, 0x52, 0xEE, 0xEF, 0xF0, 0xF1, 0xF2, 0xF3,
           0x5C, 0x9F, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58,
           0x59, 0x5A, 0xF4, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9,
           0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
           0x38, 0x39, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF
   };
/*************  Start of LZHUF unchanged code section ***************/
/*  see MVS LZH listing 6 for section source */
/*************  End of LZHUF unchanged code section ******************/
/* translate ebcdic to ascii and record length control */
static void Translate( unsigned short c )
{
    static unsigned short lpos = 0;
    if ( origin == 0x1E ) {   /* MVS origin - DOS decoding */
         /* convert ebcdic to ascii from table */
        if ( putc( ascii[c], outfile ) == EOF ) Error(wterr);
         /* perform record length control as requested */
        if (llen > 0) {
             if (lpos == llen) {
                  /* insert end of record DOS CR LF */
                 if ( ( putc( 0x0D, outfile ) == EOF ) ||
                      ( putc( 0x0A, outfile ) == EOF ) ) Error(wterr);
                  /* reset line position */
                 lpos = 0;
             } else {
                  /* keep track of line position */
                 lpos++;
             }
        }
    } else {                /* DOS origin - MVS decoding */
        if ( putc( c, outfile ) == EOF ) Error(wterr);
    }
}
/* compression */
static void Encode(void)  /* compression */
{
    short  i, c, len, r, s, last_match_length;
    unsigned long bar = 0;
    fseek(infile, 0L, 2);
    textsize = ftell(infile);
    fputc(0x1F,outfile);  /* output ascii origin */
    fputc((short)((textsize & 0xff000000L) >> 24),outfile);
    fputc((short)((textsize & 0xff0000L) >> 16),outfile);
    fputc((short)((textsize & 0xff00) >> 8),outfile);
    fputc((short)((textsize & 0xff)),outfile);
    if (ferror(outfile))
        Error(wterr);   /* output size of text */
    if (textsize == 0)
        return;
    bar = textsize / 39;  /* encode status bar */
    printf("In : %ld bytes\n"
              "--10--20--30--40--50--60--70--80--90--100\%\r", textsize);
    rewind(infile);
    textsize = 0;           /* rewind and re-read */
    StartHuff();
    InitTree();
    s = 0;
    r = N - F;
    for (i = s; i < r; i++)
        text_buf[i] = 0x20;
    for (len = 0; len < F && (c = getc(infile)) != EOF; len++)
        text_buf[r + len] = c;
    textsize = len;
    for (i = 1; i <= F; i++)
        InsertNode(r - i);
    InsertNode(r);
    do {
        if (match_length > len)
            match_length = len;
        if (match_length <= THRESHOLD) {
            match_length = 1;
            EncodeChar(text_buf[r]);
        } else {
            EncodeChar(255 - THRESHOLD + match_length);
            EncodePosition(match_position);
        }
        last_match_length = match_length;
        for (i = 0; i < last_match_length &&
                (c = getc(infile)) != EOF; i++) {
            DeleteNode(s);
            text_buf[s] = c;
            if (s < F - 1)
                text_buf[s + N] = c;
            s = (s + 1) & (N - 1);
            r = (r + 1) & (N - 1);
            InsertNode(r);
        }
        if ((textsize += i) > printcount) {
            printcount += bar;
            printf("%c",0xFE);
        }
        while (i++ < last_match_length) {
            DeleteNode(s);
            s = (s + 1) & (N - 1);
            r = (r + 1) & (N - 1);
            if (--len) InsertNode(r);
        }
    } while (len > 0);
    EncodeEnd();
    printf("    \nOut: %ld bytes\n", codesize);
    printf("Out/In: %.3f\n", (double)codesize / textsize);
}
static void Decode(void)  /* recover */
{
    unsigned short i, j, k, r, c;
    unsigned long count, bar = 0;

    textsize = (unsigned char) fgetc(infile);
    textsize <<= 8;
    textsize |= (unsigned char) fgetc(infile);
    textsize <<= 8;
    textsize |= (unsigned char) fgetc(infile);
    textsize <<= 8;
    textsize |= (unsigned char) fgetc(infile);
    if (ferror(infile))
        Error("Can't read");  /* read size of text */
    if (textsize == 0)
        return;
    bar = textsize / 40;  /* decode status bar */
    printf("In : %ld bytes\n"
              "--10--20--30--40--50--60--70--80--90--100\%\r", textsize);
    StartHuff();
    for (i = 0; i < N - F; i++)
        text_buf[i] = 0x20;
    r = N - F;
    for (count = 0; count < textsize; ) {
        c = DecodeChar();
        if (c < 256) {
            Translate(c);
            text_buf[r++] = c;
            r &= (N - 1);
            count++;
        } else {
            i = (r - DecodePosition() - 1) & (N - 1);
            j = c - 255 + THRESHOLD;
            for (k = 0; k < j; k++) {
                 c = text_buf[(i + k) & (N - 1)];
                 Translate(c);
                 text_buf[r++] = c;
                 r &= (N - 1);
                 count++;
            }
        }
        if (count > printcount) {
            printcount += bar;
            printf("%c",0xFE);
        }
    }
    printf("   \nRestored\n");
}
short main(short argc, char *argv[])
{
    char  *s;
    printf("\nLZH ver 1.02 - Multi-platform Data Compression/Decoding,\n"
           "Based on LZHUF written by Haruyasu Yoshizaki 1988 (Japan),\n"
           "               modified by Paul Edwards 1990 (Australia),\n"
           "                       and Mark Nelson 1990  (USA),\n"
           "IBM/MVS__PC/DOS by Pierre Dion 1993 (Canada),\n\n");

    if (argc < 4) {
        printf("'lzh e file1 file2' encodes file1 into file2.\n"
               "'lzh d file2 file1 l' decodes file2 into file1.\n"
               "      'l' (optional) specifies line length for crlf\n"
               "          as defined by the original ebcdic file and\n"
               "          must be expressed as an integer.\n"
               " Examples: lzh e datafile compfile\n"
               "           lzh d compfile datafile 150\n");
        return EXIT_FAILURE;
    }
    if ((s = argv[1], s[1] || strpbrk(s, "DEde") == NULL)
     || (s = argv[2], (infile = fopen(s, "rb")) == NULL)
     || (s = argv[3], (outfile = fopen(s, "wb")) == NULL)) {
        printf("??? %s\n", s);
        return EXIT_FAILURE;
    }
    if (toupper(*argv[1]) == 'E') {
        Encode();
    } else {
        if (argv[4] != NULL) llen = atol(argv[4]);

        origin = fgetc(infile);  /* read infile origin */

        if ( origin == 0x1E || origin == 0x1F )  /* MVS or DOS */
            Decode();
        else
            printf("\nError: %s is not a LZH compressed file.\n",argv[2]);
    }
    fclose(infile);
    fclose(outfile);
    return EXIT_SUCCESS;
}



Copyright © 1993, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.