Microsoft's recently released version of Macro Assembler -- MASM 6.0 -- embodies the most ambitious changes in the life of the product. The most noticeable change is that MASM is now fundamentally intended to support C programmers. Consequently, the language is more C-like, easier for C programmers to learn, and easier for programmers who code in both MASM and C to switch from one to the other. For example, the EXTRN and STRUC directives have new alias spellings to match C's extern and struct. Also, a new utility is provided to convert C header (.H) files into MASM compatible include (.INC) files.
Another view is that MASM has changed to make programming in assembly language more convenient, allowing programmers to concentrate on the structure of programs and in choosing the best instructions for the problem at hand.
MASM 6.0 includes a number of other updates, such as the CodeView debugger, Programmer's WorkBench 1.1, and a new make facility (NMAKE). This article, however, will primarily discuss the changes to the language itself.
The first change you'll notice when switching to MASM 6.0 is that the program name has been changed! The new program is ML.EXE and works in a fashion similar to the MSC compiler's CL command. ML assembles and links multiple modules. Most of the command line options have also changed. Fortunately, a small driver command named MASM.EXE is supplied that accepts most of the old command line options, converts them to their equivalent MASM 6.0 options, and then automatically runs the new ML.EXE program. This allows old batch and make files to work as before. Note, too, that the new command line interface (ML) does not prompt for parameters. Another new feature is the addition of MLX.EXE, a DOS-extended front end to ML. MLX will take advantage of DPMI, VCPI, or XMS (in that order). You should only use MLX if you are having capacity problems, because it runs slower.
Simplified segmentation directives were introduced in MASM 5.0. These directives handle all the details of setting up segments with naming conventions that match the segments generated by many high-level language compilers. MASM 6.0 adds several new enhancements that support 32-bit segments and flat model (for OS/2 2.0), additional calling conventions (SYSCALL and STDCALL), and startup and exit code.
Table 1 shows the memory models supported by MASM 6.0. Note here that the tiny model is now fully supported. The syntax for the .MODEL directive has also changed, adding options to specify the language (for calling and naming conventions), the operating system (DOS or OS/2), and the stack distance (near or far). Listing One (page 96) shows a complete "hello world" program using the simplified segmentation directives and related model-independent directives.
Model Code Data Operating Notes System(s) ------------------------------------------------------------------------ Tiny Near Near DOS code & data combined Small Near Near DOS, OS/2 1.x Medium Near DOS, OS/2 1.x Compact Near DOS, OS/2 1.x Large DOS, OS/2 1.x Huge DOS, OS/2 1.x Flat Near Near OS/2 2.x code & data combined, 32-bit offsets
One feature that is likely to be popular is the addition of directives that generate loops and decision structures in much the same way as high-level language compilers. For instance, the .IF/.ELSE loop in Figure 1 is translated to its corresponding assembly language instructions (shown at the bottom of Figure 1). Of course, the generated labels (such as @C0001) are always unique. The code generated by the decision and loop structures (such as that shown in Figure 1) can be seen in the listing file by specifying the /Sa option (maximize source listing) in conjunction with the /Fl option (generate listing file).
Figure 1: MASM 6.0 contains decision and loop directives (in this case, an .IF/.ELSE loop) that are translated to their corresponding instructions at assembly time.
.IF ax < mem_word1 mov mem_word2, 2 .ELSE mov mem_word2, 3 .ENDIF The above code is translated to the following: cmp ax, mem_word1 jnb @C0001 mov mem_word2, 2 jmp @C0003 @C0001: mov mem_word2, 3 @C0003:
A .WHILE directive and a REPEAT.. UNTIL construct are also available. The .BREAK and .CONTINUE directives can be used to terminate a .REPEAT or .WHILE loop prematurely. (Note that all of these directives begin with a period [.], to differentiate them from conditional assembly directives.)
The range of allowed conditional expressions is quite complete. The relational operators are the same as those used in C (see Table 2). However, you may know from working with the 80x86 instruction set that there are separate conditional jumps for signed and unsigned values, while the C syntax is the same for each data type. C compilers generate the proper type of conditional jumps based on the declared data types of the variables involved.
Operator Meaning == equal != not equal > greater than < less than >= greater than or equal to <= less than or equal to & bit test ! logical NOT && logical AND || logical OR
Until now, the concept of data as signed or unsigned in assembly language has been all in the programmer's mind (and on occasion, in some comments). Signed and unsigned data declarations have been added in MASM 6.0 (discussion follows), as well as the ability to override any declaration. Because the set of relational operators in C does not cover the full range available in assembly language, conditional expressions may also use flag names as operands (ZERO?, CARRY?, OVERFLOW?, SIGN?, and PARITY?).
The architects of MASM 6.0 seemed to consider nothing off limits. Directives, such as DB and DW (data byte and data word) used to declare data have all been changed. You now can use BYTE, WORD, and DWORD to declare data instead of DB, DW, and DD. The old directives are still available, so this is an optional change.
At this point, you may be asking yourself why Microsoft would change something as "unbroken" as DB. There are a number of features (such as the conditional expressions) that require the assembler to "know" whether a byte (or word, and so on) is to be treated as a signed or unsigned value. So there are also SBYTE, SWORD, and SDWORD directives for declaring signed data values. A pleasant side effect is that these directives make the language (somewhat) more self-documenting.
Additionally, there are new directives for declaring floating point data: REAL4, REAL8, and REAL10. Previously, you could declare a 32-bit IEEE floating point number with DD (and you still can). But when using the new (preferred) directives, an error will be generated if you try to declare floating point data with DWORD. Even without using the MASM 5.1 compatibility options, the older directives (DB, DW, DD, and so on) can still be used in exactly the same way as before.
About three years ago, SLR System's OPTASM was introduced, and one of its main selling points (besides speed) was that it automatically generated the shortest and fastest code for short and near unconditional jumps. In addition, it would automatically generate the two-jump sequence required when a conditional jump exceeded the 1-byte range. Later, Borland's Turbo Assembler (TASM) introduced similar capabilities. Now, MASM 6.0 has almost caught up in this category.
Probably the most annoying aspect of assembly-language programming for the 80x86 is the restriction of a 1-byte offset (+127, -128) for conditional jumps. When this limit is exceeded, previous versions of MASM (including 5.1) would generate a "jump out of range" error message. MASM 6.0 automatically translates this code for the programmer. As an example, consider the code fragment in Figure 2 along with its translated version. The only noticeable change (for a jump out of range) is that the generated code is 5 bytes long instead of 2. (There are no new labels or expanded code.) The 5 bytes are a 2-byte conditional jump (an inverse of the original) and a 3-byte unconditional jump to the intended destination.
Figure 2: MASM 6.0 automatically generates a jump fixup when there is a jump out of range. Notice that in this example the generated code is 5 bytes long instead of 2.
cmp ax, error_code je exit_error db 128 dup(90h) ; (128 bytes of code, NOP's here) exit_error: MASM 6.0 translates the above code to the following: cmp ax, error_code jne $+3 ; Note: $+3 is a relative ; jump 3 bytes ahead jmp exit_error db 128 dup (90h) exit_error:
If you are attempting to craft very compact code and you don't want this automatic action to take place, you can use the OPTION: NOLJMP directive. In addition, a level 2 warning is issued when a jump is extended. Note, however that MASM 6.0 does not generate the required jump fixups when a loop instruction is out of range (while OPTASM and TASM do).
MASM 5.1 introduced several features that simplified the writing of assembly language routines for use by high-level language (HLL) programs. An important aspect of these improvements is that it's easy to use the same code with more than one high-level language. A number of improvements added by MASM 6.0 make programming more convenient, while others appear to be directly related to making code easier to port to future versions of Windows and OS/2.
Two new calling conventions, SYSCALL and STDCALL, have been introduced for OS/2 2.0. SYSCALL is similar to the C calling convention except that no leading underscore is placed on the label and the called routine always restores the stack. STDCALL is likewise similar except that the called routine is responsible for restoring the stack unless a variable number of arguments are specified (using VARARG); in that case, the C convention is used exactly.
The syntax for the PROC directive has been expanded to include a number of new capabilities as shown in Figure 3. Note that the stack frame is automatically set up based upon the various arguments in the PROC directive and defaults based on the .MODEL directive. The concepts are similar to MASM 5.1, but options have been added for overriding the defaults.
label PROC [attributes] [USES reglist] [parameters...] where: label The name of the procedure attributes Any of distance, langtype, visibility and prologuearg reglist A list of registers following the USES keyword to be pushed by the prologue code and popped by the epilogue code. Each register must be separated by a space or tab. parameters A list of one or more parameters passed to the procedure on the stack. Each parameter consists of a parameter name, optionally followed by a colon and the parameter's data type. The data type of the last parameter may be VARARG, designating a variable number of remaining arguments. Each parameter must be separated by a comma. Attributes: distance Any of NEAR, FAR (also NEAR or FAR with 16 or 32, overriding the default segment size for 386, 486) langtype Determines the calling convention visibility PRIVATE, PUBLIC, or EXPORT prologuearg Lists the arguments required for prologue and epilogue code generation (for user-defined prologue/epilogue)
When the PROC directive is used in its extended form, the assembler automatically generates code that sets up the stack frame, pushes and pops registers that must be preserved, and properly cleans up the stack when a RET instruction is encountered. In MASM 5.1, this prologue and epilogue code is fixed based on the model and language. In MASM 6.0, you have the same fixed options, some new options, and the capability to completely define your own prologue and epilogue.
At this point, you may be wondering why you'd want to define your own prologue or epilogue. One example is in stack size checking while debugging; you must add something to every procedure, but remove it later. This makes coding and switching back a simple matter. A code coverage analyzer, for example, could insert itself into the code via user-defined prologues and epilogues.
The new extended PROC directive allows one procedure to be assembled for any memory model and calling convention. The inverse of this is the capability to assemble programs that call procedures having different memory models and calling conventions. This is done with the new INVOKE directive. Instead of pushing arguments on the stack and using the CALL instruction, use the INVOKE directive followed by the list of arguments. This is especially useful when writing code that will be linked to commercial libraries or operating systems APIs (such as OS/2 and Windows). The libraries can change models and/or calling conventions and your source only needs to be reassembled and linked. (This is a good idea because OS/2 will be changing calling conventions.)
One problem that arises is that the assembler doesn't know the type of each argument in a procedure. MASM 6.0 rectifies this problem with the new PROTO directive, which defines a procedure prototype. A procedure prototype informs the assembler of the number and type of each argument so it can generate the proper code and check for errors. Listings Two and Three (page 96) demonstrate the differences between the old and new methods when programming for Windows.
In examining Listings Twoand Three, you may think that all of this could be done with macros and conditional assembly. And you're right -- many of you have done this in the past. This mechanism, however, is now a well-defined standard that reduces code clutter, improves readability, and can be easily published in magazine articles without the necessity for printing the macros. Also, the code generated by the INVOKE directive takes into account pushing constants (a two-step process on the 8088) and pointers onto the stack.
Finally, if you use indirect calls (CALL tbl[BX]) and still want to use prototypes for error checking and documentation, there is a mechanism to define a pointer to a prototype.
MASM 6.0 also adds new instructions to support the 80486 processor. Of course you must use these with the caveat that these instructions make your program processor-specific. Programmers designing 486-specific utilities (or special versions of 386 utilities), operating systems (OS/2), and BIOSs on 486 systems will surely have use for these instructions. The new instructions are listed in Table 3.
BSWAP byte swap CMPXCHG compare and exchange INVD invalidate data cache INVLPG invalidate TLB (Translation Lookaside Buffer) entry WBINVD write back and invalidate data cache XADD exchange and add
In addition, many of the more cryptic directives have been changed in MASM 6.0 to have more meaningful names. For example, the .XALL list control directive is now .LISTMACRO. Both the old and new directives are accepted, so old code does not need to be changed, even when the MASM 5.1 compatibility options are used.
Some programmers use macros extensively, having created their own language with macro libraries. Others never use macros because they tend to hide some of the details of assembly language, possibly causing bugs or inefficient code. The changes in MASM 6.0 will please both groups and make it easier for beginners to learn macros. The changes are so substantial that there is an option to use the old macros (OPTION: OLDMACROS).
The most interesting new feature is the ability to designate macro parameters as required, or to specify a default value if the parameter is missing. Consider, for instance, the code fragment in Example 1. The REQ keyword specifies that a parameter is required. Its only effect is that of better error reporting. In this case a syntax error would have been generated if a parameter was missing, but in more complex macros these types of errors can be difficult to track down. Also note in Example 1 that any parameter followed by := designates a default value. The default value should be enclosed in angle brackets for proper recognition as a text value.
Example 1: Macro parameters can either be required as designated by the REQ keyword or specify a default value
set_cursor_pos MACRO row:REQ, col:REQ, page:=<0> mov dh, row mov dl, col mov bh, page int 10h ENDM ... set_cursor_pos 5, 10, 1 ; all parameters supplied ... set_cursor_pos 7, 15 ; page parameter takes default value ... set_cursor_pos ; ERROR: required parameters missing
Using EQU, a numeric expression that can be immediately evaluated is a permanent numeric equate. Otherwise, it is treated as a redefinable text equate. The = directive, on the other hand, assigns a numeric value that may be redefined later. But, to achieve a desired result, programmers are often forced to use the two interchangeably. The new TEXTEQU defines a text macro that is evaluated in the same manner as redefinable numeric equates.
Macro functions provide a mechanism to perform complex text processing at assembly time. A macro function is defined in the same manner as a regular macro (now called a macro procedure), but must return a text value with the EXITM directive. Text values can be returned as numeric or text constants by enclosing the text in angle brackets (<-2> or <mov>, for example), or by prefixing a text equate or numeric expression with the expansion operator (%). Listing Four (page 96) shows a macro function to calculate a factorial.
MASM 6.0 supplies both a command line option and an OPTION directive to provide compatibility with code written in MASM 5.1 (and earlier versions). The /Zm command line option sets all features to be compatible with MASM 5.1. Alternatively, the OPTION M510 statement can be placed at the beginning of your code. If you need to mix new and old features in the same code, use the OPTION directive and selectively enable or disable specific features. Note that the OPTION directive overrides any command line options.
MASM 5.1 introduced the concept of labels being local to a given procedure. Each label in a procedure can be local to just that procedure and cannot be referenced elsewhere. Under 6.0, the default behavior is that all labels are considered local. If you need to jump from one procedure to another, you can declare any label as global in scope by declaring it with two colons instead of one. This allows your code to be more readable since you can reuse the same label names from one procedure to the next. And any label intended to be accessed globally now stands out.
MASM 5.1 worked this way, but only if the .MODEL directive was used with a language type. Otherwise, the operation in MASM 5.1 was the same as OPTION:NOSCOPE. Although OPTION: SCOPE will help produce better and more readable code, it will also restrict your source code to use with MASM 6.0 (or MASM 5.1 if it uses the .MODEL with a language specified).
A structure is a group of related but dissimilar data types. Fields within a structure can have different data types and sizes. An annoying restriction in MASM 5.1 is that field names in a given structure can't be used in any other context. One standard way to get around this is to prefix all field names with the structure name or an abbreviation of the structure name.
MASM 6.0 now allows nested structures and unions. The directive STRUCT is now a synonym for STRUC (to be more like C). Fields names do not need to be unique within all identifiers but must be unique within a given nesting level for a particular structure or union. A restriction is that a field name and a text macro may not have the same name. This behavior is so different from previous versions of MASM that the OPTION:M510 and OPTION: OLDSTRUCTS (or the /Zm command-line option) cause the old structure behavior to be in effect.
The STRUCT directive provides two new options, an alignment option and the NONUNIQUE keyword. The alignment can be 1, 2, or 4 with the default being 1. The alignment value can be used to align individual fields on a particular boundary for performance. Care must be taken, however, to align the start of each structure on the same boundary. The command line option /Zp[n] (where n = 1, 2, or 4) causes structures to be aligned as specified in the structure directive, but does not specify an alignment. The NONUNIQUE keyword requires all field names of the structure or union to be fully qualified every time they are used, regardless of the compatibility options in effect (M510, OLDSTRUCTS, or /Zm).
Unions are new to MASM 6.0. Unions are similar to unions in C, variant records in Pascal, or the EQUIVALENCE statement in Fortran. Another change is that the dot operator is reserved for use by field names and cannot be used as an alternative for the + operator. This is to allow the assembler to check fields and make sure that they match with the declared structures. This makes the code more readable, in that use of the dot operator implies the use of a structure.
A pointer is a combination of a segment and an offset that is the address of, for example, a variable in memory. In various memory models pointers may be thought of as near or far, but all pointers are actually far. Near pointers just have an assumed segment in one of the segment registers. For example, in small model you would normally store only the offset portion of a pointer in memory variables. The segment portion is assumed to be in a segment register (normally DS for data). In a HLL, such as C, it is fairly easy to switch to a new model because the compiler handles all the details for you. Writing assembly-language code that is model-independent tends to be quite complicated, especially when the assembly language code is more than just a few subroutines called from a HLL.
MASM 6.0 introduces the ability to define types for pointer variables using the TYPEDEF directive. Pointer types can simply be NEAR or FAR, or they can be defined as NEAR16, NEAR32, FAR16, or FAR32 to override the current segment size. If not specified, then it defaults based on the .MODEL directive. Pointer types can also be defined in terms of a qualified type, which is any type previously defined with TYPEDEF, a structure, or any intrinsic type (such as BYTE or WORD).
The use of this new feature makes declaring model-independent data with pointers much easier and more readable. However, writing the code that accesses this data requires coding the in-line conditional assembly directives. These conditional directives can be eliminated by using traditional macros or the new text macros.
Macro Assembler 6.0 Microsoft Corporation One Microsoft Way Redmond, WA 98052-6399 206-882-8080 Price: $150 (upgrades to registered users $75) System requirements: DOS 3.0 or later or OS/2, Version 1.1 or later
The ASSUME directive has always been misunderstood by a large number of programmers. Using the simplified segment directives alleviates the need for the ASSUME directive, at least for straightforward code. In the past, the ASSUME directive allowed the program to inform the assembler what assumptions to make about the contents of a segment register. Now you can specify an assumption for a general register. This allows better error detection and allows pointer data types to be assumed.
Besides being faster than its predecessor in raw performance, MASM 6.0 now allows wildcards to be specified on the command line, which speeds the assembly of many files. I found MASM 6.0 to be 20 to 40 percent faster than MASM 5.1 in assembling source files ranging in size up to 100K. This is still not as fast as Borland's Turbo Assembler (TASM) and SLR Systems' OPTASM. (Note: OPTASM is compatible with MASM 5.0 and earlier and does not assemble 80386 instructions. TASM is compatible with MASM 5.1 and earlier and contains a number of minor extensions and other features.) See Table 4 for a speed comparison.
Assembler Test1 Test2 ------------------------- MASM 6.0 56 51 MASM 5.10 69 -- TASM 2.0 46 31 OPTASM 1.72 33 18*
Test1: Assemble 20 files (20K to 100K in size, 900K total).
Test2: Make or wildcard assembly of same files.
*Used OPTASM's built-in make file. All times in seconds. All tests run on a 25MHz 80386.
A number of previous MASM updates forced old code to be modified. But this time, some of the changes are so major that Microsoft has added the capability to support MASM 5.1 features selectively, or all at once. But overall, this is an excellent upgrade, primarily because most of the new features help in writing code that is easier to read and maintain.
The upgrade to MASM includes major changes to the internal operation of the assembler as well as a complete facelift to the command line options and many of the assembler directives. MASM can now assemble and link multiple files from the command line, fixup conditional jumps that are out-of-range and generate code for looping and decision structures. However, with all these changes, you still must deal with the 80x86 instruction set, just as before and that is what assembly language programming is really all about.
<a name="0196_001f"> .MODEL small .STACK 100 ; reserves 100 bytes for the stack .CODE ; start of code segment main PROC .STARTUP ; generates startup code mov bx, 1 ; stdout mov cx, msg_len mov dx, offset DGROUP:msg mov ah, 40h ; write to handle int 21h ; call DOS to write msg .EXIT ; generates exit code main ENDP .DATA ; start of data segment msg BYTE 'Hello world.' msg_len equ $ - msg END main ; end, specify starting address <a name="0196_0020"> <a name="0196_0021">[LISTING TWO]
<a name="0196_0021"> EXTRN GetDC : far EXTRN MoveTo : far EXTRN LineTo : far EXTRN ReleaseDC : far point_list struc x1 dw ? y1 dw ? x2 dw ? y2 dw ? point_list ends . . (assume bx = hWnd) . push bx call GetDC ; returns hDC mov di, ax push di push [si].x1 push [si].y1 call MoveTo push di push [si].x2 push [si].y2 call LineTo push bx push di call ReleaseDC . . . <a name="0196_0022"> <a name="0196_0023">[LISTING THREE]
<a name="0196_0023"> GetDC PROTO FAR PASCAL hWnd:WORD MoveTo PROTO FAR PASCAL hDC:WORD, nX:WORD, nY:WORD LineTo PROTO FAR PASCAL hDC:WORD, nX:WORD, nY:WORD ReleaseDC PROTO FAR PASCAL hWnd:WORD, hDC:WORD option oldstructs point_list struct x1 word ? y1 word ? x2 word ? y2 word ? point_list ends . . (assume bx = hWnd) . invoke GetDC, bx ; returns hDC mov di, ax invoke MoveTo, di, [si].x1, [si].y1 invoke LineTo, di, [si].x2, [si].y2 invoke ReleaseDC, bx, di . . . <a name="0196_0024"> <a name="0196_0025">[LISTING FOUR]
<a name="0196_0025"> factorial MACRO num LOCAL result, factor IF num LE 0 %error factorial parameter out of bounds ENDIF result = 1 factor = num WHILE factor GT 0 result = result * factor factor = factor - 1 ENDM EXITM %result ENDM i = 1 REPEAT 20 ; repeat block macro DWORD factorial(i) ; to generate a table of i = i + 1 ; the first 20 factorials ENDM DWORD factorial(-33) ; error </PRE> <P> <P> <h4>Example 1. Macro parameters can either be required as designated by the REQ keyword or specify a default value</h4> <P> <pre> set_cursor_pos MACRO row:REQ, col:REQ, page:=<0> mov dh, row mov dl, col mov bh, page int 10h ENDM ... set_cursor_pos 5, 10, 1 ; all parameters supplied ... set_cursor_pos 7, 15 ; page parameter takes default value ... set_cursor_pos ; ERROR: required parameters missing
Figure 1: MASM 6.0 contains decision and loop directives (in this case, an .IF/.ELSE loop) that are translated to their corresponding instructions at assembly time.
.IF ax < mem_word1 mov mem_word2, 2 .ELSE mov mem_word2, 3 .ENDIF
The above code is translated to the following:
cmp ax, mem_word1 jnb @C0001 mov mem_word2, 2 jmp @C0003 @C0001: mov mem_word2, 3 @C0003:
Figure 2. MASM 6.0 automatically generates a jump fixup when there is a jump out of range. Notice this example that the generated code is five bytes long instead of two.
cmp ax, error_code je exit_error db 128 dup(90h) ; (128 bytes of code, NOP's here) exit_error:
MASM 6.0 translates this to the following:
cmp ax, error_code jne $+3 ; Note: $+3 is a relative ; jump 3 bytes ahead jmp exit_error db 128 dup(90h) exit_error:
Copyright © 1991, Dr. Dobb's Journal