Embedded Systems

Assembly Language Lives!

By Michael Abrash, March 01, 1990

Assembly language isn't the be-all and end-all of PC programming, but as Michael states, it's sometimes the only game in town when performance or program size are important.

MAR90: ASSEMBLY LANGUAGE LIVES!

Michael works on high-performance graphics software at Metagraphics in Scotts Valley, Calif. He is also the author of Zen of Assembly Language published by Scott, Foresman & Co., and Power Graphics Programming, from Que.

There's an old joke that goes something like this:

Person #1: Help! My brother thinks he's a chicken, and I don't know what I should do.

Person #2: Have you told him the truth?

Person #1: I would, but I need the eggs.

Updated for the modern age of structured languages and object-oriented programming, that joke would read:

Manager #1: Help! My programmers think assembly language is a viable programming language, and I don't know what I should do.

Manager #2: Have you told them the truth?

Manager #1: I would, but I need the speed.

Assembly language beats everything else hands down when it comes to performance -- especially when programming for the 80x86, where assembly language is wild, woolly, and wondrous -- yet it gets no respect. When you flat-out need performance, there simply are no substitutes for assembly language -- so why doesn't anyone seem to love it?

Assembly Language Isn't Cheap

Experts, pundits, and management types have been beating the drums for the demise of assembly language for years. There are many good reasons for wishing it dead. Compared to compiled code, good assembly-language code is harder to write, is more bug prone, takes more time to create, is harder to maintain, is harder to port to other platforms, and is more difficult to use for complex, multiprogrammer projects. That makes assembly language an expensive, demanding, and time-consuming development language. Given the realities of time to market, the relative costs of good assembly language and high-level language programmers, programmer turnover, and ever-increasing software complexity, it's neither surprising nor unreasonable that most of the industry wishes assembly language would go away.

Assembly language lives, though, for one simple reason: Properly applied, it produces the best code of any language. By far.

Assembly Language Lives

Don't believe me? Consider this. If the carbon-based computer between your ears were programmed with as good a compiler as Microsoft's, then you'd generate much better code in assembly language than does Microsoft C, because you know vastly more about what you want your program to do and are marvelously effective at integrating that knowledge into a working whole. High-level languages are artificially constrained programming environments, able to pass relatively little of what you know along to the ultimate machine code. There are good reasons for that: High-level languages have to be compilable and comprehensible by humans. Nonetheless, there's no way for a high-level language to know where to focus its efforts, or which way to bias code.

For example, how can a Pascal compiler know that one loop repeats twice, on average, while another repeats 32,767 times? How can a C compiler know that one subroutine is time critical, deserving of all possible optimization, while another subroutine executes in the background while waiting for the next key to be pressed, so speed matters not at all? The answer is: No way. (Actually, #pragma can do a little of that, but it's no more than a tiny step in the right direction.)

Just as significantly, no compiler can globally organize your data structures and the code that manipulates those structures to maximum advantage, nor take advantage of the vast number of potential optimizations as flexibly as you can. (Space forbids even a partial listing of optimization techniques for the 80x86 family: The list is astonishingly long and varied. See Tim Paterson's article in this issue for a small but potent sample.) When it comes to integrating all the information about a particular aspect of a program and implementing the code as efficiently as possible given the capabilities of a particular processor, it's not even close: Humans are much better optimizers than compilers are.

Almost any processor can benefit from hand-tuned assembly language, but assembly language lives most vibrantly in the 80x86 family. The 80x86 instruction set is irregular; the register set is small, with most registers dedicated to specific purposes; segments complicate everything; and the prefetching nature of the 80x86 renders actual execution time non-quantifiable -- and optimization at best an art and at worst black magic -- making the 80x86 family a nightmare for optimizing-compiler writers. The quirky (and highly assembly language amenable) instructions of the 8086 live on in the latest 80x86-family processors, the 80386 and 80486, and will undoubtedly do the same for many generations to come. Other processors may lend themselves better to compilers, but the 80x86 family is and always will be a wonderland for assembly language.

Consider this: Well-written assembly language provides a 50 to 300 percent boost in performance over compiled code (more sometimes, less others, but that's a conservative range). An 8-MHz AT is about three times faster than a PC, a 16-MHz 80386 machine is about twice as fast as an AT, and a 25-MHz 80386 is about three times as fast as an AT. There are a lot of PCs and ATs out there -- 20 to 30 million, I'd guess -- and there is a horde of users contemplating the expenditure of thousands of dollars to upgrade.

Now consider this. Those users don't have to upgrade -- they just need to buy better-written software. The performance boost good assembly language provides is about the same as stepping up to the next hardware platform, but the assembly language route is one heck of a lot cheaper.

In other words, better software can eliminate the need for expensive hardware, giving the developer the opportunity to realize a healthy profit for his extra development efforts. Just as important is the fact that good assembly language runs perfectly well on slower computers, making the market for such software considerably larger than the market for average software. If you make your software snappy on an 8088, your potential market doubles instantly and the competition thins.

Finally, it's on the slower computers -- the PC and AT -- that assembly language optimization has the most effect (see the example later in this article), and that's precisely where improved performance is most needed.

Enter the User

So assembly language produces the best code. What of it? If high-level languages make it easier and faster to create programs, who cares if those programs are slower?

The user, that's who. Users care about perceived performance -- how well a program seems to run. Perceived performance includes lack of bugs, ease of use, and, right at the top of the list, responsiveness. Hand users a whizbang program that makes them wait at frequent intervals, and they'll leave it on the shelf after trying it once. Give users a program that never gets in their way, and they may love it without ever knowing quite why. In these days of all-too-sluggish graphical interfaces, the performance issue is central to the usability of almost every program.

What users don't care about is how a program was made. Do you care how your car was designed? You care that it's safe, that it's reliable, and that it performs adequately, but you certainly don't care whether the manufacturer used just-in-time manufacturing, or whether mainframe or micro-computer CAD was used in the design process. Likewise, users don't care whether a programmer used OOP or C or Pascal, or COBOL, for that matter; they care that a program does what they need and performs responsively. That's not purely a matter of speed, but without speed the user will never be fully satisfied. And when it comes to speed, assembly language is king.

Use Only as Directed

When you need it, there's no substitute for assembly language, but it can be a drag when you don't need it -- so know when to use it. Humans are better large-scale designers and small-scale optimizers than compilers, but they're not very good at the grunt work of compiling, such as setting up stack frames, handling 32-bit values, allocating and accessing automatic variables, and the like. Moreover, humans are much slower at generating code, so it's a good idea to avoid being a "human compiler." Some people create complex macros and assembly language programming conventions and do all their programming in assembly language. That works -- but what those macros and conventions do is make assembly language function much like a high-level language, so there's no great benefit, especially given that you can drop into assembly language from a high-level language at any time just by calling an assembly language subroutine (or, better yet, by using in-line assembly language in a compiler that offers that feature, such as Turbo C). Unless you're a masochist, let your favorite compiler do what it's best at -- compiling -- and save assembly language for those small, well-defined portions of your software where your efforts and unique skills pay off handsomely.

A relevant point is that assembly language alone is not the path to performance. If you have a program that takes as long as a second to update the screen, you have problems that assembly language alone won't solve: Proper overall design and algorithm selection are also essential. However, most software designers consider the job done when the design and algorithm phases are complete, leaving the low-level optimization to the compiler. I repeat: No compiler can match a good assembly language programmer at low-level optimization. Given the irregular nature of the 80x86 family and the huge PC software market, it's well worth the time required to hand-optimize the few critical portions that control perceived performance. Only in assembly language can you take full responsibility for the performance of your code.

Don't Spit into the Wind

While I can't offer a cut-and-dried dictum on when to use assembly language, the practice of using it when the user would notice if you didn't is a good rule of thumb. While some programmers would take this rule too far and use assembly language too often, the vast majority of programmers will lean over backwards the other way, in the face of all evidence to the contrary. Hal Hardenberg's late, lamented DTACK Grounded reveled in the folly of the AT&T programmers who implemented the floating-point routines for a super-micro in C rather than assembly language -- with the result that the computer performed floating-point arithmetic not quite so fast as a Commodore VIC-20!

Likewise, I once wrote an article in which I measured the performance of an assembly-language line-drawing implementation at four to five times that of an equivalent C implementation. One reader rewrote the C code for greater efficiency, ran it through Microsoft C rather than Turbo C, and wrote to inform me that I had shortchanged C; assembly language was actually "only" 70 percent faster than C. As it happens, the assembly-language code wasn't fully optimized, but that's not the important point: What really matters is that when programmers go out of their way to produce code that's nearly twice as slow (and in an important user-interface component, no less) in order to use a high-level language rather than assembly language, it's the user who's getting shortchanged. Commercial developers in particular can't afford to ignore this, and I suspect that most such developers are DDJ readers. If you're aiming to sell hundreds of thousands of copies of a program, you're guaranteed to have stiff competition. If you don't go the extra mile to provide snappy response, someone else will -- and you'll be left out in the cold.

On the other hand, assembly language code is harder and slower to write, and pays off only in the few most critical portions of any program. There are limits to the levels of complexity humans can handle in assembly language, and limits to the development time that can be taken before a product must come to market. Identify the parts of your programs that significantly affect the performance perceived by the user (a code profiler can help greatly here), and focus your efforts on that code, with especially close attention to oft-repeated loops.

80x86 Assembly Language in Action

Enough talk. Let's look at an example of assembly language in action. Listing One, page 94, shows a C subroutine, CopyUppercase, that copies the contents of one far zero-terminated string to another far zero-terminated string, converting all lowercase characters to uppercase in the process. The subroutine consists of a single, extremely compact loop that should be ideal for compiler optimization. In fact, I organized the loop for the best results with Microsoft C 5.0, the test compiler, and used the intermediate variable UpperSourceTemp in order to allow for more efficient compiled code. There may be a more efficient way to code this subroutine, but if you're going to go to the trouble of being compiler-specific and knowing compiler code generation that intimately, why not use assembly language, which provides direct control and gives you the freedom to create the best possible code? Microsoft C 5.0 generates the code shown in Figure 1 from the version of CopyUppercase in Listing One when maximum optimization is selected with the /Ox switch. It's not bad code, but neither is it great. The far pointers are stored in memory and must be loaded each time through the loop, and a considerable amount of work is expended on determining whether each character is uppercase, although the case check is done with a table look-up, which is generally one of the most desirable 80x86 programming techniques. A serious failing is that none of the 80x86 family's best instructions -- the string instructions -- are used. The upshot is that Listing One runs in the times listed in Figure 2 on various PC-compatible computers. (All times discussed in this article were measured with the Zen timer described in my book Zen of Assembly Language, from Scott, Foresman & Company, modified slightly to work with Microsoft C.)

Figure 1: The code generated for CopyUppercase by Microsoft C 5.0 when Listing One is compiled with the /Ox switch (maximum optimization)

  _CopyUppercase   proc                   near
            push   bp
            mov    bp,sp
            sub    sp,0002
  Label1:
            les    bx,[bp+08]
            mov    cl,es:[bx]
            inc    word ptr [bp+08]
            mov    ax,cx
            cbw
            mov    bx,ax
            test   byte ptr [bx+0115],02
            je     Label2
            mov    ax,cx
            sub    al,20
            jmp    Label3
  Label2:
            mov    ax,cx
  Label3:
            les    bx,[bp+04]
            mov    es:[bx],al
            inc    word ptr [bp+04]
            or     cl,cl
            jne    Label1
            mov    [bp-02],cl
            mov    sp,bp
            pop    bp
            ret

  _CopyUppercase   proc                   near

Figure 2: The execution times of the various C and assembly language implementations of CopyUppercase shown in Listings One through Five. For a given listing running on a given processor, the number in parentheses represents the performance of that listing relative to the performance of Listing One on that processor; the higher the value, the better the performance. 8088 timings were performed on an IBM XT; 80286 timings were performed on a 10-MHz one-wait-state AT clone; and 80386 timings were performed on a 20-MHz zero-wait-state 32K-cache Toshiba T5200

  String type/     Execution time in microseconds on
  Language
  (Listing)           8088         80286       80386
  ----------------------------------------------------
  Far strings/C    2258 (1.0)    466 (1.0)   140 (1.0)
 (Listing One)

  Far strings/ASM   662 (3.4)    150 (3.1)    62 (2.3)
  (Listing Two)

  Near strings/C   1183 (1.9)    282 (1.7)    95 (1.5)
  (Listing Three)

  Near strings/     574 (3.9)    115 (4.1)    50 (2.8)
  ASM
  (Listing Four)

  Near strings/     410 (5.5)     85 (5.5)    46 (3.0)
  optimized ASM
  (Listing Five)

Can we do better in assembly language? Indeed we can, as Listing Two (page 94), which replaces the C version of CopyUppercase in Listing One with an assembly language version, illustrates. Listing Two simply keeps both far pointers in registers and uses string instructions to access both strings; the return for the 21 assembly-language instructions that do that is a performance improvement ranging from two to three-plus times, as shown in Figure 2. If this code happens to be in a performance-sensitive portion of a program, that's quite a return for a little assembly language.

Now, you may well think that the above example is biased in favor of assembly language, what with the far pointers, which assembly language tends to handle much better than do compilers. I would disagree: Almost every PC program now takes advantage of the full 640K of memory, and most of that memory must be accessed via far pointers, so access to far data is a most important issue to PC developers, and the ability of assembly language to handle far data just about as fast as near data is a substantial point in favor of assembly language. In fact, this example is representative of a large class of problems developers face, involving data copying, data transformation, data checking, pointers, and segments. Nonetheless, let's see what happens if we alter CopyUppercase to use near pointers.

Listing Three (page 94) shows Listing One changed to use near pointers. Listing Three, which generates the code shown in Figure 3, is indeed much faster than Listing One; it still takes at least half again as long as Listing Two, but it's closing the gap. By contrast, Listing Two wouldn't much benefit from near pointers, because it already keeps the pointers in the registers. Does that mean that for near data C almost matches assembly language?

Figure 3: The code generated for CopyUppercase by Microsoft C 5.0 when Listing Three is compiled with the /OX switch (maximum optimization)

  _CopyUppercase   proc                   near
            push   bp
            mov    bp,sp
            sub    sp,0002
            push   di
            push   si
            mov    di,[bp+04]
            mov    si,[bp+06]
  Label1:
            mov    cl,[si]
            inc    si
            mov    ax,cx
            cbw
            mov    bx,ax
            test   byte ptr [bx+0115],02
            je     Label2
            mov    ax,cx
            sub    a1,20
            jmp    Label3
            nop
  Label2:
            mov    ax,cx
  Label3:
            mov    [di],al
            inc    di
            or     cl,cl
            jne    Label4
            mov    [bp+04],di
            mov    [bp+06],si
            mov    [bp-02],cl
            pop    si
            pop    di
            mov    sp,bp
            pop    bp
            ret
  _CopyUppercase   proc                   near

Not a chance. We haven't optimized the assembly language implementation yet; Listing Two is just a straight port of Listing One from C to assembly language. Listing Four (page 94) shows Listing Two converted to use near pointers, plus a couple of twists. First, two bytes are loaded converted to uppercase, and stored at once, cutting the number of memory-accessing instructions in half. Second, the value used to convert characters to uppercase and the upper- and lowercase bounds are stored in registers outside the loop, so that they can be used more efficiently inside the loop. These are simple optimizations, but ones that I doubt you'll find a compiler using -- and they're highly effective. As Figure 2 indicates, Listing Four is approximately 20 percent faster than Listing Two and about two times faster than the near C implementation of Listing Three.

We're not done optimizing yet, though. We've focused so far on relatively simple, linear optimization. Let's pull out all the stops, throw some unorthodox techniques at the problem, and see what comes of it.

On most PC compatibles, the key is this: The processor is slow at fetching instruction bytes and branching (in fact, all 80x86 processors are relatively slow at branching). If we can keep one or the other of those aspects from dragging the processor down, we can often improve performance considerably. As it happens, we can attack both bottlenecks. Look-up tables shrink code size, thereby easing the instruction fetching problem, and avoid branches as well. Well then, why not simply look up the uppercase version of each character? While we're at it, why not look it up with the remarkably compact and efficient xlat instruction? In this way we can convert the five instructions used to convert to uppercase in Listing Four to a single xlat. We can also improve performance by repeating multiple instances of the contents of the loop in-line, one after the other; doing this allows virtually all of the conditional jumps to fall through, eliminating branching almost entirely. Both changes appear in Listing Five, page 94. As Figure 2 indicates, those two changes improve performance by 8 to 40 percent -- and the improvement is greatest on the slower 8088 and 80286 machines, which is surely where speed matters most. (Nor is this code maxed out even yet; I simply had to draw the line somewhere in the interests of keeping the code readily comprehensible and this article to a reasonable length. For example, we could use lodsw to speed up Listing Five much as we did in Listing Four. Never assume that your code is fully optimized!)

Bear in mind, too, that the code in Listing Five can handle far pointers as easily as near if the look-up table is moved into the code or stack segment and accessed with a segment override, a change that would scarcely affect performance at all. When it comes to handling far strings, then, we've improved performance by three to five and one-half times. To put that in perspective, the performance improvement gained by running the original C code on a 20-MHz zero-watt-state 32K-cache 80386 computer rather than a run-of-the-mill 10-MHz one-watt-state 80286 computer was only a little over three times. I think it's obvious which is the cheaper solution to improving performance.

(It's worth noting that carefully crafted assembly language was required to produce the massive performance improvement measured earlier. Assembly language by itself guarantees nothing, and bad assembly language, which is easy to write, brings new meaning to the word bad.)

Don't think I've picked an example that stacks the deck in favor of assembly language. In fact, assembly language would do considerably better if we worked with arrays or fixed-length Pascal-style strings, and would do better than compiled code in cases where there were more variables to keep in the registers. We also weren't able to use repeated string instructions in the earlier example; when such instructions can be used, as is often the case when an entire program's data structures are organized with efficient assembly language code in mind, the performance advantage of assembly language can approach an order of magnitude. In short, we looked at a simple, limited example (and actually one that lends itself relatively well to compiler optimization), and in optimizing it we've scarcely begun to tap the treasure trove of assembly-language tools and techniques.

Yes, compiler library functions can use string instructions and other assembly-language tricks as readily as your own assembly language code can, but there's a great deal that library functions can't do. Don't assume that library functions are well written, either -- some are, but many aren't. And remember that the author of the library knows no more than the author of the compiler about when you most need performance, and so must design code for adequate performance under all circumstances. You, on the other hand, can precision-craft your code for best performance exactly when and where you need it. Also, keep in mind that library functions can work only within the current model. When you're working with data on the far heap in a program compiled with the small model (an efficient arrangement for programs that must handle a great deal of data), library functions can't help you.

Finally, Microsoft C is a very good optimizing compiler, considerably better than most of the compilers out there. There are a few compilers that generate somewhat better code than Microsoft C, but I'm willing to bet that most of the C programmers reading this use either Microsoft or Turbo C. (Turbo C did not match Microsoft C on this particular example, so I used Microsoft C in order to give C every advantage.) The C code was written to allow for maximum optimization (the loop is only four lines long, for goodness' sake) and uses a macro -- not a function call -- that expands to a table look up. In other words, the cream of the C crop, given readily optimized code and using a look-up table, went head-to-head with a few dozen hand-optimized assembly-language lines -- and proved to be about two to five times slower.

Size Matters Too

I've focused on performance so far because the primary use of assembly language lies in making software faster. Assembly language can make for far more compact programs as well, although that's less often important because the PC has a large amount of memory available relative to processing power and because saving space is a diffuse effort, requiring attention throughout the program, while enhancing performance is a localized phenomenon, and so offers a better return on programming time.

There are cases where program size is crucial -- memory-resident programs, device drivers, utilities, for example -- and assembly language can work wonders. Of course, good assembly language code is very tight, and hence very small, but there's more to it than that. It's easy to drive programs with compact data strings in assembly language (see "Roll your Own Minilanguages with Mini-Interpreters" which I co-authored with Dan Illowsky, DDJ, September 1989). It's also easy to map in code sections from disk as needed; assembly language can be far more flexible than any overlay manager. Finally, assembly language eliminates the need for non-essential start-up and library code. Co-workers tell me of the time they needed to distribute a program to accept a keypress from the user and return a corresponding error level to a batch file. Written in C, the program was 8K in size; unfortunately, the distribution disk didn't have that much free space. Rewritten in assembly language, the same program was a mere 50 bytes long.

When you absolutely, positively need to keep program size to a minimum, assembly language is the way to go.

Can Live with It, Can't Live without It

Assembly language isn't the be-all and end-all of PC programming, but it is the only game in town when either performance or program size is paramount. Assembly language should be used only when needed and, used wisely, offers unparalleled code quality and an excellent return for programming time invested.

For all the drawbacks of assembly language, eight-plus years of PC software development have proven that developers can live with it; programs containing assembly language have been written in an expeditious manner and work very well, indeed. Those same years have shown that developers can't afford to live without assembly language. I suspect you'd be hard pressed to find any important PC software that contains no assembly language at all, and I can assure you that any application with a graphical user interface either contains assembly language or is a dog. (Sure, Windows applications and applications that link in third-party libraries may not contain assembly language, but that's because they've passed that responsibility off to other developers. And just who are those developers? DDJ readers, that's who. Somebody has to create the good code that top-notch software requires.)

For all the wishing, 80x86 assembly language isn't going away soon; in fact, it's not going to go away at all. The 80x86 architecture lends itself beautifully to assembly language, and performance will always be at a premium, no matter how fast processors get. Back, when I used a PC, I thought if I had a computer that was ten times faster, all my software would run so fast that I'd never have to wait. Well, now I use just such a computer, and much of the software I use is faster as well (MASM, for example, is about ten times faster than it used to be, and TASM is even faster) -- and still I spend a lot of time waiting. Software is never fast enough, and better software is one heck of a lot cheaper than better hardware.

ASSEMBLY LANGUAGE LIVES! by Michael Abrash

[LISTING ONE]

<a name="0081_0010">

/* Sample program to copy one far string to another far string,
 * converting lower case letters to upper case letters in the process. */

#include <ctype.h>

char Source[] = "AbCdEfGhIjKlMnOpQrStUvWxYz0123456789!";
char Dest[100];

/*
 * Copies one far string to another far string, converting all lower
 * case letters to upper case before storing them.
 */
void CopyUppercase(char far *DestPtr, char far *SourcePtr) {
   char UpperSourceTemp;

   do {
      /* Using UpperSourceTemp avoids a second load of the far pointer
         SourcePtr as the toupper macro is expanded */
      UpperSourceTemp = *SourcePtr++;
      *DestPtr++ = toupper(UpperSourceTemp);
   } while (UpperSourceTemp);
}

main() {
   CopyUppercase((char far *)Dest,(char far *)Source);
}





<a name="0081_0011"><a name="0081_0011">
<a name="0081_0012">

[LISTING TWO]

<a name="0081_0012">

; C near-callable subroutine, callable as:
;       void CopyUppercase(char far *DestPtr, char far *SourcePtr);
;
; Copies one far string to another far string, converting all lower
; case letters to upper case before storing them. Strings must be
; zero-terminated.
;
parms   struc
        dw      ?       ;pushed BP
        dw      ?       ;return address
DestPtr dd      ?       ;destination string
SourcePtr dd    ?       ;source string
parms   ends
;
        .model small
        .code
        public _CopyUppercase
_CopyUppercase  proc    near
        push    bp
        mov     bp,sp                   ;set up stack frame
        push    si                      ;preserve C's register vars
        push    di
;
        push    ds                      ;we'll point DS to source
                                        ; segment for the duration
                                        ; of the loop
        les     di,[bp+DestPtr]         ;point ES:DI to destination
        lds     si,[bp+SourcePtr]       ;point DS:SI to source
CopyAndConvertLoop:
        lodsb                           ;get next source byte
        cmp     al,'a'                  ;is it lower case?
        jb      SaveUpper               ;no
        cmp     al,'z'                  ;is it lower case?
        ja      SaveUpper               ;no
        and     al,not 20h              ;convert to upper case
SaveUpper:
        stosb                           ;store the byte to the dest
        and     al,al                   ;is this the terminating 0?
        jnz     CopyAndConvertLoop      ;if not, repeat loop
;
        pop     ds                      ;restore caller's DS
;
        pop     di                      ;restore C's register vars
        pop     si
        pop     bp                      ;restore caller's stack frame
        ret
_CopyUppercase  endp
        end




<a name="0081_0013"><a name="0081_0013">
<a name="0081_0014">

[LISTING THREE]

<a name="0081_0014">

/* Sample program to copy one near string to another near string,
 * converting lower case letters to upper case letters in the process. *
/
#include <ctype.h>

char Source[] = "AbCdEfGhIjKlMnOpQrStUvWxYz0123456789!";
char Dest[100];

/*
 * Copies one near string to another near string, converting all lower
 * case letters to upper case before storing them.
 */
void CopyUppercase(char *DestPtr, char *SourcePtr) {
   char UpperSourceTemp;

   do {
      /* Using UpperSourceTemp allows slightly better optimization
         than using *SourcePtr directly */
      UpperSourceTemp = *SourcePtr++;
      *DestPtr++ = toupper(UpperSourceTemp);
   } while (UpperSourceTemp);
}

main() {
   CopyUppercase(Dest,Source);
}




<a name="0081_0015"><a name="0081_0015">
<a name="0081_0016">

[LISTING FOUR]

<a name="0081_0016">

; C near-callable subroutine, callable as:
;       void CopyUppercase(char *DestPtr, char *SourcePtr);
;
; Copies one near string to another near string, converting all lower
; case letters to upper case before storing them. Strings must be
; zero-terminated.
;
parms   struc
        dw      ?       ;pushed BP
        dw      ?       ;return address
DestPtr dw      ?       ;destination string
SourcePtr dw    ?       ;source string
parms   ends
;
        .model small
        .code
        public _CopyUppercase
_CopyUppercase  proc    near
        push    bp
        mov     bp,sp                   ;set up stack frame
        push    si                      ;preserve C's register vars
        push    di
;
        mov     di,[bp+DestPtr]         ;point DI to destination
        mov     si,[bp+SourcePtr]       ;point SI to source
        mov     cx,('a' shl 8) + 'z'    ;preload CH with lower end of
                                        ; lower case range and CL with
                                        ; upper end of that range
        mov     bl,not 20h              ;preload BL with value used to
                                        ; convert to upper case
CopyAndConvertLoop:
        lodsw                           ;get next two source bytes
        cmp     al,ch                   ;is the 1st byte lower case?
        jb      SaveUpper               ;no
        cmp     al,cl                   ;is the 1st byte lower case?
        ja      SaveUpper               ;no
        and     al,bl                   ;convert 1st byte to upper case
SaveUpper:
        and     al,al                   ;is the 1st byte the
                                        ; terminating 0?
        jz      SaveLastAndDone         ;yes, save it & done
        cmp     ah,ch                   ;is the 2nd byte lower case?
        jb      SaveUpper2              ;no
        cmp     ah,cl                   ;is the 2nd byte lower case?
        ja      SaveUpper2              ;no
        and     ah,bl                   ;convert 2nd byte to upper case
SaveUpper2:
        stosw                           ;store both bytes to the dest
        and     ah,ah                   ;is the 2nd byte the
                                        ; terminating 0?
        jnz     CopyAndConvertLoop      ;if not, repeat loop
        jmp     short Done              ;if so, we're done
SaveLastAndDone:
        stosb                           ;store the final 0 to the dest
Done:
        pop     di                      ;restore C's register vars
        pop     si
        pop     bp                      ;restore caller's stack frame
        ret
_CopyUppercase  endp
        end




<a name="0081_0017"><a name="0081_0017">
<a name="0081_0018">

[LISTING FIVE]

<a name="0081_0018">

; C near-callable subroutine, callable as:
;       void CopyUppercase(char *DestPtr, char *SourcePtr);
;
; Copies one near string to another near string, converting all lower
; case letters to upper case before storing them. Strings must be
; zero-terminated. Uses extensive optimization for enhanced
; performance.
;
parms   struc
        dw      ?       ;pushed BP
        dw      ?       ;return address
DestPtr dw      ?       ;destination string
SourcePtr dw    ?       ;source string
parms   ends
;
        .model small
        .data
; Table of mappings to uppercase for all 256 ASCII characters.
UppercaseConversionTable        label   byte
ASCII_VALUE=0
        rept    256
if (ASCII_VALUE lt 'a') or (ASCII_VALUE gt 'z')
        db      ASCII_VALUE             ;non-lower-case characters
                                        ; map to themselves
else
        db      ASCII_VALUE and not 20h ;lower-case characters map
                                        ; to upper-case equivalents
endif
ASCII_VALUE=ASCII_VALUE+1
        endm
;
        .code
        public _CopyUppercase
_CopyUppercase  proc    near
        push    bp
        mov     bp,sp                   ;set up stack frame
        push    si                      ;preserve C's register vars
        push    di
;
        mov     di,[bp+DestPtr]         ;point DI to destination
        mov     si,[bp+SourcePtr]       ;point SI to source
        mov     bx,offset UppercaseConversionTable
                                        ;point BX to lower-case to
                                        ; upper-case mapping table
; This loop processes up to 16 bytes from the source string at a time,
; branching only every 16 bytes or after the terminating 0 is copied.
CopyAndConvertLoop:
        rept    15         ;for up to 15 bytes in a row...
        lodsb                           ;get the next source byte
        xlat                            ;make sure it's upper case
        stosb                           ;save it to the destination
        and     al,al                   ;is this the terminating 0?
        jz      Done                    ;if so, then we're done
        endm

        lodsb                           ;get the next source byte
        xlat                            ;make sure it's upper case
        stosb                           ;save it to the destination
        and     al,al                   ;is this the terminating 0?
        jnz     CopyAndConvertLoop      ;if not, repeat loop
Done:
        pop     di                      ;restore C's register vars
        pop     si
        pop     bp                      ;restore caller's stack frame
        ret
_CopyUppercase  endp
        end

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Embedded Systems

Assembly Language Lives!

Assembly Language Isn't Cheap

Assembly Language Lives

Enter the User

Use Only as Directed

Don't Spit into the Wind

80x86 Assembly Language in Action

Figure 1: The code generated for CopyUppercase by Microsoft C 5.0 when Listing One is compiled with the /Ox switch (maximum optimization)

Figure 3: The code generated for CopyUppercase by Microsoft C 5.0 when Listing Three is compiled with the /OX switch (maximum optimization)

Size Matters Too

Can Live with It, Can't Live without It

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Embedded Systems Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Embedded Systems

Assembly Language Lives!

Figure 1: The code generated for CopyUppercase by Microsoft C 5.0 when Listing One is compiled with the /Ox switch (maximum optimization)

Figure 3: The code generated for CopyUppercase by Microsoft C 5.0 when Listing Three is compiled with the /OX switch (maximum optimization)

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Embedded Systems Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content