Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Design

Run Length Encoding


FEB89: RUN LENGTH ENCODING

Robert Zigon is a senior software engineer and can be reached at 4505 Candletree Circle, Apt. 7, Indianapolis, IN 46254.


Run Length Encoding (RLE) is one of several techniques that can be used to reduce the storage requirements of text files, databases, and digital images. The algorithm is very simple to implement, it produces output files that, on the average, require 80 percent of the input file space, and executes quickly. Because the compression factor is of prime interest, it should be pointed out that the worst case performance of the algorithm causes the output file to double its size. My use of the algorithm, however, for compressing and saving the digitized output of a frame-grabber board has never resulted in this degeneracy.

The algorithm works as follows: An input buffer is scanned for sequences of identical characters. When a character transition occurs the repetition count of the previous character (along with the character itself) is sent to an output buffer.

This continues until the end of the input buffer is reached. My implementation currently uses 1 byte for the repetition count. This allows for sequences of identical characters to be reduced to exactly 2 bytes. The output for the input buffer looks like this:

  Input Buffer: AAAA BBBBBB CC D
                                EEEEE   
  Output Buffer: 4A 6B 2C 1D 5E

In this example, the 18 characters in the input buffer were reduced to 10 characters in the output buffer (the spaces in the input and output buffers are there to make it easier to read). Notice that no space savings was gained by compressing the CC and that the space requirements of the D doubled.

Listing One contains a routine called PACK that can be used to compress an input buffer. The code is written in the assembly language of the Intel 80X86 family. The routine is designed so that multiple calls will result in having the output concatenated to the contents of the previous call.

After the information is packed and saved to disk, the day will come when you will need to unpack it. Unpacking is significantly easier. A repetition count is read from the packed buffer, and the corresponding character is sent to the output buffer that many times. The string load (LODSB) mnemonic permits the efficient reading of the pairs, while the string store (STOSB) and REP prefix perform the duplication and restoration of the input file. Listing One also contains the necessary UNPACK routine to accomplish this.

Run Length Encoding is a fast and efficient technique for reducing the space requirements of an input file. Though more complex algorithms exist with better compression factors (and worse compression times), its elegance lies in its simplicity.

_Run Length Encoding_ by Robert Zigon

[LISTING ONE]

<a name="0078_0005">

;--------------------------------------------------------------------
; RLE.asm   Run Length Encoding Routines
; Author : Bob Zigon
;--------------------------------------------------------------------
; -------------------------------------------------------------------
; PACK
;    This routine will pack a source buffer into a destination buffer
;    by reducing sequences of identical characters down to a 1 byte
;    repetition count and a 1 byte character.
;    INPUT : DS:SI -- Points to source buffer to pack
;            ES:DI -- Points to destination
;            DX    -- Number of characters in source buffer to pack
;    OUTPUT: DI    -- Is updated to allow for multiple calls.
; -------------------------------------------------------------------
pack           proc   near
               push   ax
               push   bx
               push   cx
               lodsb
               mov    bl,al
               xor    cx,cx            ; Counts number of characters packed
               cld                     ; All moves are forward

p10:           lodsb                   ; Get chara into AL
               inc   cx                ; Inc chara count
               sub   dx,1              ;
               je    p30               ; Exit when DX = 0
               cmp   cx,255            ; 255 characters in this block yet?
               jne   p20               ; If not then jump

               mov   [di],cl           ; output the length
               inc   di
               xor   cx,cx
               mov   [di],bl           ; output the character
               inc   di

p20:           cmp   al,bl             ; Has there been a character transition?
               je    p10               ; NOPE ... so go back for more

               mov   [di],cl           ; output the length
               inc   di
               xor   cx,cx
               mov   [di],bl           ; output the character
               inc   di

               mov   bl,al             ; Move Latest chara into BL
               jmp   p10               ; Loop back for more
;
;  We will get here when DX finally goes to zero.
;
p30:           mov   al,cl             ; Output the length
               stosb
               mov   al,bl             ; Output the character
               stosb

               pop   cx
               pop   bx
               pop   ax
               ret
pack           endp

; -------------------------------------------------------------------
; UNPACK
;    This routine will unpack a sequence of characters that were
;    previously PACKed.
;    NOTE  : Source Region must be terminated with a NULL.
;    INPUT : DS:SI -- Source buffer to unpack
;            ES:DI -- Destination buffer to unpack into
;    OUTPUT:
; -------------------------------------------------------------------
unpack         proc   near
               push   ax
               push   cx
               xor    cx,cx    ; Make sure CH is zero
               cld             ; All moves are forward

unp10:         lodsb           ; Load into AL a character from DS:[SI]
               cmp  al,0       ; Length of zero is end of region
               je   unp40      ;
               mov  cl,al      ; Length goes into CL
               lodsb           ; AL has chara to repeat

               rep stosb       ; Store AL to [DI] and repeat while CX <> 0
               jmp unp10       ; Loop back forever

unp40:         pop  cx
               pop  ax
               ret
unpack         endp













Copyright © 1989, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.