Robert Zigon is a senior software engineer and can be reached at 4505 Candletree Circle, Apt. 7, Indianapolis, IN 46254.
Run Length Encoding (RLE) is one of several techniques that can be used to reduce the storage requirements of text files, databases, and digital images. The algorithm is very simple to implement, it produces output files that, on the average, require 80 percent of the input file space, and executes quickly. Because the compression factor is of prime interest, it should be pointed out that the worst case performance of the algorithm causes the output file to double its size. My use of the algorithm, however, for compressing and saving the digitized output of a frame-grabber board has never resulted in this degeneracy.
The algorithm works as follows: An input buffer is scanned for sequences of identical characters. When a character transition occurs the repetition count of the previous character (along with the character itself) is sent to an output buffer.
This continues until the end of the input buffer is reached. My implementation currently uses 1 byte for the repetition count. This allows for sequences of identical characters to be reduced to exactly 2 bytes. The output for the input buffer looks like this:
Input Buffer: AAAA BBBBBB CC D EEEEE Output Buffer: 4A 6B 2C 1D 5E
In this example, the 18 characters in the input buffer were reduced to 10 characters in the output buffer (the spaces in the input and output buffers are there to make it easier to read). Notice that no space savings was gained by compressing the CC and that the space requirements of the D doubled.
Listing One contains a routine called PACK that can be used to compress an input buffer. The code is written in the assembly language of the Intel 80X86 family. The routine is designed so that multiple calls will result in having the output concatenated to the contents of the previous call.
After the information is packed and saved to disk, the day will come when you will need to unpack it. Unpacking is significantly easier. A repetition count is read from the packed buffer, and the corresponding character is sent to the output buffer that many times. The string load (LODSB) mnemonic permits the efficient reading of the pairs, while the string store (STOSB) and REP prefix perform the duplication and restoration of the input file. Listing One also contains the necessary UNPACK routine to accomplish this.
Run Length Encoding is a fast and efficient technique for reducing the space requirements of an input file. Though more complex algorithms exist with better compression factors (and worse compression times), its elegance lies in its simplicity.
_Run Length Encoding_ by Robert Zigon
[LISTING ONE]
Copyright © 1989, Dr. Dobb's Journal<a name="0078_0005">
;--------------------------------------------------------------------
; RLE.asm Run Length Encoding Routines
; Author : Bob Zigon
;--------------------------------------------------------------------
; -------------------------------------------------------------------
; PACK
; This routine will pack a source buffer into a destination buffer
; by reducing sequences of identical characters down to a 1 byte
; repetition count and a 1 byte character.
; INPUT : DS:SI -- Points to source buffer to pack
; ES:DI -- Points to destination
; DX -- Number of characters in source buffer to pack
; OUTPUT: DI -- Is updated to allow for multiple calls.
; -------------------------------------------------------------------
pack proc near
push ax
push bx
push cx
lodsb
mov bl,al
xor cx,cx ; Counts number of characters packed
cld ; All moves are forward
p10: lodsb ; Get chara into AL
inc cx ; Inc chara count
sub dx,1 ;
je p30 ; Exit when DX = 0
cmp cx,255 ; 255 characters in this block yet?
jne p20 ; If not then jump
mov [di],cl ; output the length
inc di
xor cx,cx
mov [di],bl ; output the character
inc di
p20: cmp al,bl ; Has there been a character transition?
je p10 ; NOPE ... so go back for more
mov [di],cl ; output the length
inc di
xor cx,cx
mov [di],bl ; output the character
inc di
mov bl,al ; Move Latest chara into BL
jmp p10 ; Loop back for more
;
; We will get here when DX finally goes to zero.
;
p30: mov al,cl ; Output the length
stosb
mov al,bl ; Output the character
stosb
pop cx
pop bx
pop ax
ret
pack endp
; -------------------------------------------------------------------
; UNPACK
; This routine will unpack a sequence of characters that were
; previously PACKed.
; NOTE : Source Region must be terminated with a NULL.
; INPUT : DS:SI -- Source buffer to unpack
; ES:DI -- Destination buffer to unpack into
; OUTPUT:
; -------------------------------------------------------------------
unpack proc near
push ax
push cx
xor cx,cx ; Make sure CH is zero
cld ; All moves are forward
unp10: lodsb ; Load into AL a character from DS:[SI]
cmp al,0 ; Length of zero is end of region
je unp40 ;
mov cl,al ; Length goes into CL
lodsb ; AL has chara to repeat
rep stosb ; Store AL to [DI] and repeat while CX <> 0
jmp unp10 ; Loop back forever
unp40: pop cx
pop ax
ret
unpack endp