PDQ: Less Baggage, Faster Code

DEC89: PDQ: LESS BAGGAGE, FASTER CODE

Bruce develops and sells software for TRS-80 and MS-DOS/PC-DOS computers. You may reach him at T.N.T. Software Inc., 34069 Hainesville Rd., Round Lake, IL 60073.

To a user, the biggest difference between a compiled program written in a high-level language and a compiled program written in assembly language is the size of the executable (EXE) file. The majority of compilers, regardless of language, produce large EXE files even when the size of the original program is trivial. Most serious programmers would like to have an alternative to this problem of bloated code.

QuickBasic programmers, take note: Crescent Software has found a way to put those fat EXE files on a severe diet, shrinking some of them to less than 1K bytes. As a bonus, Crescent's PDQ library allows Basic TSR programs, and permits Basic programs to return an error level to MS-DOS.

The large size of most EXE files partially result from the way in which a language's libraries are arranged. According to Crescent Software, the Microsoft QuickBasic libraries are typical: If a program contains a single CLS command (which clears the screen), the linker also includes routines for COLOR, CSRLIN, POS(O), LOCATE, and SCREEN (x,y). Most of these routines also include code that can manage screens in graphics mode, whether or not that code is needed. The addition of other simple commands and functions adds even more overhead. This library "clumping" factor is called "granularity."

I used LIB.EXE to investigate Crescent's claims. A simple BEEP command in QuickBasic 4 costs 923 bytes of overhead, and a single ASC( ) function adds 554 bytes to the size of an EXE file! That's granularity with a vengeance.

Another part of a standard EXE file is taken up by routines that start a program and determine the type of video adapter attached, the keyboard, the current screen and graphics mode, and so on. Still more overhead is added in order to handle the process of program termination. This overhead is easily seen when a null program is compiled: In QuickBasic 4.5, a program that contains only an END statement compiles to an EXE of 9842 bytes in size!

Crescent claims that the QuickBasic compiler actually produces very efficient code. If this is true, it should be possible to produce very small executables by making the library more efficient and by reducing the size of default start-up and termination code. This method seems to be exactly what Crescent has done.

Crescent's PDQ is a substitute library for those who use the Microsoft QuickBasic compiler, Version 4.0, or higher. The PDQ library is intended to replace the BCOM4x.LIB file with a subset library that has smaller granularity and less sophistication.

"Less sophistication" means that no floating-point routines are available. There are no routines to draw circles. Error handling is greatly simplified; only eight error codes are supported. Disk files must be either sequential or binary. All arrays must be static, however, an allocMEM function is provided to simulate dynamic arrays. PDQ provides no random-number generator, no ON anything, no DATA statement or READ/RESTORE commands, TAB, PRINT USING, or WRITE.

Is this too much to give up? I don't think so, and I'll bet you won't, either. I wouldn't use the PDQ library to write a billing system, but that's not PDQ's intended use. Before the introduction of PDQ, I thought that only C or assembly would be appropriate for the inevitable menus, filters, translators, and miscellany that are part of most systems. Now, I intend to use PDQ to develop most of them.

In fact, Crescent has done what many programmers have been requesting. For years, compiler vendors have been increasing the size of the default code that is automatically linked to each program. To handle the proliferation of hardware options, vendors supply many library routines in the most general form possible -- and that only makes matters worse.

It's hard to argue with the apparently sensible decision to supply general routines. If I write a program, I want to be sure that it will run whether the user has VGA, MDA, or anything in-between. On the other hand, many programs are strictly character-mode and need nothing more than the most elementary TTY functions. Some file utilities may require only the simplest I/O. In those cases, why should the developer have to accept an extra 25K (or more) of unnecessary run-time code?

The availability of different (individual selectable) versions of library routines would provide a real advantage for developers. (I've long argued that developers would be willing to pay for such libraries.) Even better, I'd like the option to use some sort of meta-command to specify which libraries will be used for which parts of a program. Given the current levels of complication and sophistication in compilers and libraries, this may be a lot to ask.

At the very least, the libraries should have finer granularity. It's probably true that the process of linking libraries that have a very fine granularity takes more time, and that these libraries may even require the use of multiple link passes. I'm willing to make that sacrifice, and I think that most developers would be willing to do so too -- at least for final distribution versions of software.

Another potential problem exists with respect to reduced granularity. Most modules are aligned on paragraph boundaries, meaning that each module wastes an average of half a paragraph -- 8 bytes. The more modules, the more wasted memory. For speed reasons, modules should at least be aligned on word boundaries, this technique causes an average waste of one byte per module. This waste factor doesn't sound like much of a problem, but it can become one if a program contains several thousand very small modules. The use of a more intelligent linker or an integrated compiler/linker might cut at least some of this waste.

I've told several compiler vendors that I would be willing to accept compile times of an hour or longer on a 386 machine -- provided that the compiler produced the tightest, most highly optimized code possible, and that it linked only the routines that were absolutely necessary. I would be willing to pay a good deal of money for such a Basic compiler and linker. (For whatever language, I'm not the only developer with those opinions.) I would also be willing to debug with something faster but less efficient.

From that viewpoint, Crescent's PDQ may signal the start of something big. If PDQ becomes a success, it may spur the major compiler vendors to offer something equivalent to PDQ or better. Most developers are not reconciled to working with large executables that need a lot of memory. Speed and efficiency will always have appeal.

To see how well Crescent has succeeded with PDQ, let's consider some examples of simple programs written in QuickBasic and modified for the PDQ library. Example 1lists the (hoary) standard SIEVE program, and Example 2 shows the equivalent program modified for PDQ.

Example 1: Standard Sieve program

  DEFINT A-Z
  DIM flags (8190)
  CLS
  PRINT "25 iterations"
  x# = TIMER
  FOR j = 1 TO 25
         count = 0
         FOR i = 0 TO 8190
                flags(i) = 1
         NEXT i
         FOR i = 0 TO 8190
                IF flags (i) THEN
                   prime = i + i + 3
                   k = i + prime
                   WHILE k <= 8190
                          flags (k) = 0
                          k = k + prime
                   WEND
                   count = count + 1
                END IF
          NEXT i
  NEXT j
  xx# = TIMER
  PRINT USING "#### primes in ##.###
  seconds";count;xx#-x#
  END

Example 2: PDQ version of the Sieve program

  rem $include: 'pdqdecl.bas'
  DIM flags (8190)
  CLS
  PRINT "25 iterations"
  start&=pdqtimer&
  FOR j = 1 TO 25
         count = 0
         FOR i = 1 TO 8190
                flags(i) = 1
         NEXT i
         FOR i = 0 TO 8190
                IF flags (i) THEN
                   prime = i + i + 3
                   k = i + prime
                   WHILE K <= 8190
                          flags (k) = 0
                          k = k + prime
                   WEND
                   count = count + 1
                END IF
         NEXT i
  NEXT j
  done&=pdqtimer&
  tot&=(10*(done&-start&))\182
  'convert timer ticks to seconds.
  frac&=(10000&*(10* (done&-start&)
  mod 182))\182 'and decimal.
  PRINT count; "primes in ";rtrim$
       (str$(tot&));".";frac&;"seconds."
  END

I also compiled a modified version of the program (under PDQ and QB 4.5) with all PRINT and TIMER statements removed. Table 1 shows the file sizes after this program is compiled.

Table 1: File sizes after PDQ compilation

                    --PDQ--                  --QB 4.5--
                Std.        No Print       Std.        No Print

---------------------------------------------------------------


  SIEVE.BAS      590          352           452          352
  SIEVE.OBJ     1585          951          1226          951
  SIEVE.EXE     2620         1282         27544        12522

In each case, the EXE file created by PDQ is about a tenth of the size of the EXE file created by the standard QuickBasic library. Few changes were necessary, and the program ran just as quickly -- it ran in about 3.24 seconds on my computer.

Notice the effects of library granularity. Removing the PRINT, TIMER, and double-precision subtraction routines sliced the size of the QB 4.5 EXE file by 15K bytes. A simple TTY print routine, plus the bytes contained in the actual strings, might use a total of 150 bytes. TIMER ought to not use more than a dozen or so bytes. Double-precision subtraction routines and the numeric-to-ASCII conversions would surely require less than 2K bytes. The rest of the 15K overhead in the QB 4.5 EXE file comprises all of the unnecessary linked code.

A Small Utility Program

Consider a small filter utility. Most programmers would write such a utility in C or in assembly language in order to generate the smallest EXE possible. The filter utility simply inspects the text in the file named on the command line and converts all lowercase letters to uppercase. This utility looks like the program in Example 3. The filter utility is straightforward and the task is trivial, but this type of program is the core of most file-filter utilities. The file sizes produced with PDQ were:

                   Std.         No Print

   UCASE.BAS       592           459
   UCASE.OBJ       1723          1553
   UCASE.EXE       2578          2348

Note: I added some remarks for publication, so the source code size will not match the numbers shown above.

UCASE.BAS ran with impressive speed, converting a 104K-text file to uppercase letters in about three seconds on my Tandy 4000 (16-MHz 80386, no math coprocessor, Plus Development Hard Card 40, with 64K cache enabled). Further, on this small program, PDQ was one-third the size of the equivalent program written in QuickC and Turbo C.

Example 3: Sample filter utility

     defint a-z                     'by default, all variables are integers
     call initstr(a$, 128)          'initialize a$ to 128 characters
     call initstr (b$, 8192)        'buffer for file is 8K bytes
     a$=command$                    'look at the command line
     if rtrim$(a$)="" then          'no file named; give help.
         print"Syntax:"
         print "UCASE filename"
         print "The named file will be converted in place to upper-case."
         end
         end if
     open rtrim$(a$) for binary as #1       'no random files in PDQ
     size&=lof(1)       'file size
     if size&=0 then close 1:kill rtrim$(a$): end      'no such file
     where&=1           'start with the first byte
     while where&<=size&           'as long as there are bytes to read
            remains&=size&-where&+1              'how much is left?
            if remains&<8192 then call setlength (b$, remains&)   '< 8192 bytes
            get 1,where&, b$      'grab the next chunk to convert
            b$=ucase$(b$)         'upper-case it
              put 1, where&, b$     'write it back out
            where&=where&+len(b$) 'update the file position
     wend
       end

PDQ also contains several alternate libraries. These libraries provide special versions of various modules, optimized for the 80286 and 80386 processors. The PDQ documentation mentions that the 80386 library is especially suitable for use with long-integer calculations, most notably long-integer divisions.

To test PDQ's speed during various operations, I used a modification of the benchmark programs presented in my earlier QuickBasic review (DDJ, November 1988). The results are shown in Table 2.

Table 2: Benchmark results

                  Times per Million Long Integer Operations

                              8088      80286     80386     QB4.5
  Raw loop:                   2.76      2.75      2.24     13.63
  Assignments:                1.86      1.86      1.37      1.75
  Additions:                  16.20     16.20     13.73     14.34
  Subtractions:               16.20     16.20     13.73     14.40
  Multiplications:            28.28     27.40     22.95     25.33
  Divisions:                  32.46     32.46     25.04     32.20
  Comparisons:                12.02     12.08     11.25     10.54

                             Times per Million Operations

                                   8088      80286     80386     QB4.5
  Fixed string assignments:        35.81     36.52     36.63     51.43
  Fixed string MID$ operations:    36.03     36.24     35.59     23.67
  Fixed string "concatenations":   36.03     36.24     35.59     24.26

                             Miscellaneous Times and Size

                                           8088    80286    80386   QB4.5
  Print 1,000 70-byte strings to screen:   27.02   27.02    27.02    9.93
  Executable size (bytes):                 7,346   7,344    7,126  46,139

Generally speaking, the PDQ routines are equivalent to the routines in the standard QB 4.5 library. The PDQ 80386 routines are appreciably faster at long-integer operations, as advertised.

Regardless of the library used, the long-integer assignment operations are much faster with PDQ. I know no reason for this difference and few programs use enough assignment operations to make this difference in speed noticeable to the programmer.

TestHotkey          StuffBuf
IntEntry 1          ResetKeyboard
PopUpHere           PopDown
PointIntHere        IntEntry2
GotoOldInt          EndTSR
CallOldInt

Rather than explaining each of these commands in detail, I'll present a sample TSR program that uses them.

I altered one of Crescent's many demonstration programs to output keyboard directly to screen memory. This routine's output was not used because it cannot be redirected. (The ability to direct output is of great use in utility programs.) Also the routine does not automatically scroll.

The slower print speeds occur because PDQ's standard PRINT commands go through DOS services. PDQ also contains a PDQPRINT routine that prints macros. The sample program intercepts calls to interrupt 9 (the standard keyboard interrupt). If the key combination pressed is ALT-A, ALT-S, or ALT-D, the TSR program clears the keyboard buffer and inserts the defined sequence of keystrokes into the same buffer. Non-trapped keys are passed to the standard Int 9 handler. You may not stuff more than 15 characters into the keyboard buffer (as mentioned in the PDQ documentation).

PDQPRINT is useful when a programmer needs to force screen printing (even when the standard output has been redirected), or when greater speed is necessary. In addition, PDQPRINT supports color attributes; the PRINT routine in PDQ doesn't.

I also compared the speeds of the integer operations, and found little difference between PDQ and QB 4.5. Realistically, integer operations are so fast in nearly any compiler that much greater performance increases result from the use of optimization techniques rather than from the use of tricks for integer arithmetic. Because PDQ uses the code generation of QB 4, any optimization (or lack of it) in QB's code generation is reflected in PDQ. The performance changes from PDQ result only from changes in library routines.

TSR Power

The PDQ library does more than just remove functions and commands -- it also adds some new ones. The most notable added commands and functions make TSR programs possible. Some of these new commands and functions are:

The sample programs furnished with PDQ do work. If you have any doubts about the correct way to do something with PDQ, look at those examples carefully. I accidentally omitted the final call ReturnFromInt for ALT-D, and was sure I'd found a bug. When my program was changed to a TSR program, all attempts to use macros failed after the first time that ALT-D was pressed. I'm not exactly sure why this happened, but the error was certainly mine and not PDQ's.

Even Basic's executables would make such TSRs rather wasteful. With PDQ, TSRs are small enough to be practical. The sample program that I just presented results in an EXE file of only 2404 bytes in size.

The Manual

The PDQ documentation supplied with the first release of the library was in preliminary form, and will be changed soon. Ethan Winer of Crescent Software told me that only 100 manuals were printed for the first release of PDQ, so criticisms of the documentation may be moot.

It is important to note that if your program uses large static arrays, you should use the /ex option on the link command line. (This option is not mentioned in the first version of the manual.) The /ex option packs space used by static arrays and results in much smaller EXE files. Without the /ex option, the SIEVE program compiled to approximately 18K under PDQ. I also used the /ex option when I compiled with QB 4.5.

This first version of the manual was disorganized in places, but was fairly easy to use. The information about the new commands and the changes from QB 4 were well organized, but the discussion of TSR programs was incomplete. In some places, the manual didn't match the syntax of the first release. Fortunately, the distribution disk contains a generous number of helpful example programs, all of which work. If you purchase PDQ, I recommend that you read the manual and then pay very close attention to the examples on the disk.

From previous experience, I knew that Crescent Software's manuals are chatty and easy to read. The manual for PDQ is typical. Ethan Winer clearly knows a lot about the internals of QuickBasic, and he's not shy about sharing his knowledge. The manual contains a number of performance hints, tips for reducing the size of typical PDQ and QuickBasic programs, and even ways to call QuickBasic library routines directly. One such example shows how to force QuickBasic to do the equivalent of MAT READ.

On balance, I liked the manual very much despite its early flaws; later versions of the manual should be better.

Overall Utility

PDQ is aimed at QuickBasic programmers, but it has clear applicability in other areas. In the past, I resorted to the use of assembly language or C to write small utilities. I resented the general unreadability of C and the more complicated compilation process that it requires. Also, because of the low-level nature of assembly and C, I found the debugging process to be far more tedious than the process of debugging Basic, even with the new versions of C and assembly from Borland and Microsoft. As long as I benefited from efficiency and speed, I was willing to put up with those disadvantages.

Granted, there will still be times when C or assembly are the most suitable tools for the job. PDQ simply makes those times less frequent, and makes my work easier.

Programmers who are unfamiliar with either C or assembly can now attempt more projects. With the advent of PDQ, Basic can only gain stature as a readable and efficient language that is suitable for both small and large projects. As a long-time Basic enthusiast, I can't pretend to be unbiased about the prospect.

PDQ is a watershed product. I think it is the first in a series of efficient compilers and alternative libraries. In the next few years, we'll almost certainly see similar attempts for C, Pascal, and other popular languages. Subsequent products will surely include ever more efficient compilers and linkers, benefiting us all.

Whether or not you find PDQ appealing, I feel sure that the approach taken by Crescent Software is the approach that the whole industry will follow in the future. Other developments are doubtful -- this one is not. Bloated code may finally be doomed.

Product Information

P.D.Q. Crescent Software, Inc.
11 Grandview Ave.,
Stamford, CT 06905
Includes full source code
For IBM PC, XT, AT, or compatible
$99

Tools

PDQ: Less Baggage, Faster Code

Example 1: Standard Sieve program

Example 2: PDQ version of the Sieve program

Table 1: File sizes after PDQ compilation

A Small Utility Program

Example 3: Sample filter utility

Table 2: Benchmark results

TSR Power

The Manual

Overall Utility

Product Information

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Tools Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Tools

PDQ: Less Baggage, Faster Code

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Tools Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content