One problem with assemblers is that they can't reference passed arguments and local variables. Ray presents some techniques for getting around this limitation.
September 01, 1988
URL:http://www.drdobbs.com/database/arguments-and-automatic-variables-in-ass/184407997
Raymond Moon is currently writing the assembly-language procedures for a new remote bulletin-board system under development by the Annapolis Software Guild. He can be reached at 16005 Pointer Ridge Dr., Bowie, MD 20716.
Although it's often useful and even necessary to write assembly-language subroutines that are called by a high-level language, a problem with Microsoft MASM is that it cannot use variable names to reference two important data classes: passed anguments and automatic (or local) variables. This shortcoming prompted me to develop a solution. This article describes techniques that allow the addressing of passed arguments and automatic variables using variable names.
In assembly language a variable is normally placed in a data segment, and the assembler associates a fixed offset with the variable name. Because passed anguments and automatic variables are created at run time by allocating space for them on the stack, such variables do not have fixed addresses -- they can vary each time the same procedure is called. More significantly, if the function is recursive (that is, if the function can call itself), there might be multiple copies of these variables on the stack at the same time. One characteristic of these variables is that they be fixed in their relative position to each other; this is the key that allows the use of variable names to address them. The technique is to create a template using an assembly-language structure declaration to define the relative position of the variables.
Unfortunately, the method of passing parameters to subroutines differs among languages, among different compilers for the same language, and even among different versions of the same compiler. The code in this article was assembled using the Microsoft Macro Assembler, Version 5.0, and was designed to link with Microsoft C, Version 5.0. Also, unless otherwise stated, the discussion and code assume the small-memory model, which means that all procedures use near calls and all pointers are 16 bits long. Later I'll mention how to use this technique with all memory models and with different compilers.
To see how the structure declaration forms a template for addressing arguments, it's necessary to understand the arrangement of the stack upon entry to a called procedure. The standard C calling convention is to push arguments onto the stack in right-to-left order as they appear in the function argument list.
Using the C library function strncpy() as an example (see Example 1, next page), the argument n is pushed first, followed by the pointer string2 and finally the pointer string1. Last, the near call pushes the value of the instruction pointer (ip) register onto the stack to serve as the return address. The entry code of the called procedure must initialize a register to point to these parameters on the stack. The register used for this purpose is the base pointer (bp) register. The calling procedure's bp register value must be saved and then restored before returning to the caller. The standard entry code to perform this task is:
push bp mov bp,sp
Now all the passed arguments can be addressed as positive offsets from the bp register. The stack at this point is shown in Table 1 , on page 83.
char *strncpy (string1, string2, n); char *string1; pointer to the first string char *string2; pointer to the second string unsigned int n; number of characters to be copied
Current Value Stack n [bp + 8] string2 [bp + 6] string1 [bp + 4] ip register [bp + 2] sp, bp => saved bp ref [bp]
The passed parameters can be accessed by using their relative offset from the bp register. Thus the code required to initialize registers with the passed parameters for strncpy() could be:
mov di,[bp + 4] ; di points to ; destination string mov si,[bp + 6] ; si points to source ; string mov cx,[bp + 8] ; cx = number of ; char to copy
The trouble with using the relative address directly is that the code is hard to read and maintain. You can easily forget the offset to any passed argument, especially when reopening a program months after you originally wrote it.
A common method used to address passed arguments by name is to use the equ directive. For the strncpy() example, these equates might be:
n equ [bp+8] string1 equ [bp + 6] string2 equ [bp+4]
The code to load the pointers and count into the same registers would then be:
mov di,string1 mov si,string2 mov cx,n
Although this improves code readability, it improves maintainability only marginaily. If you change the calling argument list, you must recalculate the offsets and change the equates, which increases the possibility of error.
A better way is to let the macro assembler do it for you, which you can do by defining a structure for the part of the stack in which the passed parameters reside. For the previous example, the structure is shown in Example 2, this page.
PARMS struc ; Passed arg addressing struc dw 2 dup(?) ; Saved ip and bp registers string1 dw ? ; Pointer to 1st string N dw ? ; Pointer to 2nd string PARMS ends
A convenient result of the C calling convention (pushing the arguments onto the stack in reverse order) is that the order of the arguments in the structure is the same as in the function call argument list.
For C programmers, an easy way to use the offsets associated with structure names is to use the . (period) structure operator, which is similar to the . (period) operator in C. Using the PARMS structure, the earlier code fragment would be:
mov di,[bp].string1 mov si,[bp].string2 mov cx,[bp].n
Although this is a little more complex because it uses structure notation, the code clearly identifies the variables as passed arguments.
Using structures in this manner increases code maintainability and versatility. If the argument list is changed, only the structure definition needs to be rewritten because the macro assembler automatically calculates the new offsets and uses them throughout the rest of the code. Another advantage is the ease with which the structure can be modified to work with the doubleword pointers required when using the compact-, large-, and huge-memory models, in which the data is not restricted to one 64K segment. You need only change the dw directive to dd, and the macro assembler will calculate the proper offsets. Table 2, page 83, shows the correlation between the C and assembly-language data types.
C ASM C ASM <1>char dw Pointers: short dw small/compact dw int dw medium/large/huge dd long dd <1> Char is expanded into a word before being pushed onto the stack
The real power of using assembly-language structures comes with the ability to create and address automatic variables. Storage for this variable class is allocated at run time from space available on the stack. Automatic variables allow assembly procedures to be recursive -that is, to call themselves without overwriting variables previously saved.
Automatic variables reside below the saved copy of the caller's bp register; remember that the stack grows downward as values are pushed onto it. Using the standard entry logic shown earlier, Table 3, page 84, shows a stack frame containing three integer-size arguments and three integer-size automatic variables.
Table 3
The immediate problem shown in Table 3 is that the offsets to the automatic variables are negative with respect to the value in the bp register but structure notation generates only positive offsets. The solution is to make the last variable the new addressing base. All automatic variables can then be addressed as positive offsets from this variable. Example 3, page 86, illustrates a structure for automatic variables using this method. The code required to make room for the automatic variables at run time and the equate required to define their addressing base is shown in Example 4, page 86.
This entry code should be used whenever automatic variables are used, even if there are no passed parameters to the procedure.
Example 3 shows some other advantages of using a structure for automatic variables. The assembler can calculate the exact number of bytes subtracted from the sp register using the size operator. Modifications are thus as easy as adding, changing, or deleting variables in the structure definition.
In creating such structures, it's important to ensure that all word and doubleword variables begin on a word boundary. You can do this by moving all byte-length variables to the bottom of the structure. If there is an odd number of byte-length variables, add an unnamed byte variable at the end so that the length of the entire structure is an even number. These precautions will ensure efficient memory accesses with 16-bit processors. If the code is to execute on an 80386, the only additional precautions are that any doubleword pointers should begin on a doubleword boundary and that the length of the PARMS structure should be divisible by 4.
The addressing of automatic variables using the equate shown in Example 3 would be:
Again, the operator is used with all automatic variables. The assembler calculates the offset to this particular automatic variable as the sum of the negative offset from the AUTO equate and the positive offset generated by the variable's position in the structure. Although this sounds complicated, the assembler can handle it easily. Also notice that the use of AUTO clearly identifies INT__1 as an automatic variable as opposed to a passed argument or a regular variable declared in the DATA segment. This easy recognition increases a program's readability and maintainability.
The code for returning from an assembly-language subroutine is simple:
The mov instruction clears the stack of all automatic variables and can be omitted if automatic variables are not used. The pop instruction restores the calling procedure's frame pointer.
A caution is in order here. The typing of variable names in assembly structures is not as strong as in C. This means that all variable names must be unique within a given assembly-language source file. The same variable name cannot be duplicated in two different structures or in a structure and the data segment because the assembler will not know which offset is associated with that name.
Let's look at a function that ties together both passed variables and automatic variables. This function is called file__length() and its C definition is:
Using the #defines from Example 5, page 87, you might call this function as follows:
This call will search the default directory for "filename.ext" using the normal file attribute, which means that read-only, hidden, or system file attribute is set, it will not be found. If "filename.ext" is not found, the function returns -1L, otherwise, it returns the file size in bytes as a long integer.
The file length() subroutine uses DOS function 4eh (Find First) to obtain information about the file. It extracts the length from this information and returns it to the calling procedure. Listing One, page 110, contains the assembly code for file length().
Lines 72 through 84 are required so that this assembly function can link to a Microsoft C program by defining the segments used by Microsoft C. If you use a different compiler, the documentation will probably contain all the information required to rewrite this section.
Lines 96 through 121 contain the structures required for this procedure. The first structure is for addressing the passed parameters, and it follows the guidelines discussed previously. The second structure defines the automatic variables. In designing this function, I wanted to minimize any interference with the calling program. Because DOS function 4eh uses the disk transfer area (DTA) to pass the file information back to the program, my procedure needed to create a new DTA so that the contents of the current DTA would not be destroyed. Therefore, a new DTA is created as an automatic variable. The address of the old DTA must be saved here also so that the old DTA can be restored before the procedure returns.
The last two structures, lines 128 to 136, are required to address double-word pointers and long integers. Because the assembler has a limited ability to address doubleword variables, these structures make the variables more naturally addressable. Although the same structure could be used for both, I used different ones so that the names more closely describe the actual actions being taken.
Now that the supporting structures have been defined and the required segments created, the actual code begins. file_length() starts with the standard entry code. The ds and es segment registers are saved by pushing them onto the stack because they are changed by file_length(). Notice that these are pushed onto the stack after the automatic variables have been created so as not to affect the automatic variable structure. Microsoft C uses si and di for register variables. If your routine uses si and/or di, you should save their values on the stack at entry and restore them before returning. (Because file_length() doesn't use si and di, I didn't need to do this.)
From this point on, the code is fairly straightforward. The address of the current DTA is determined and saved, and a new DTA is created on the stack. The registers are then set up for the DOS 4eh function call. If the carry flag is set upon return from DOS, the file was not found, so -1L is moved into F__LENGTH; otherwise, F__LENGTH contains the file length in bytes. Finally the old DTA is restored, and the length is loaded into the dx:ax registers in preparation for returning.
Table 4, below, summarizes the changes in the passed argument structure for different memory models. Although this tabIe reflects Microsoft C memory conventions, it should work with most other compilers and languages.
It's possible to increase the readability and maintainability of assembly-language programs by using named variables for passed parameters and automatic variables. The feature of the macro assembler that allows this is the structure definition that creates a template for the stack on which these two types of variables reside. Using the structure also simplifies the creation of automatic variables on the stack. Finally, using the structure declaration and the size operator, programmer math errors become a thing of the past.
I wish to thank Vince Castelli for his guidance while writing this article.
The Microsoft Macro Assembler Version 5.0 furnishes tools for mixed-language programming. Among them is a macro include file called mixedInc that facilitates the writing of assembly-language procedures that link with Microsoft Basic, C, Fortran, and Pascal. The basic macros defined are:
These macros were designed around the Microsoft memory segment names, and they greatly increase the ease of writing assembly-language code. The hLocal macro has a couple of limitations, however. First, it is unable to handle automatic variables that are arrays; hLocal deals only with single objects, so it could not be used to simplify file__length() because of the two arrays used. Remember, a string is an array of type char. Second, any automatic variable declared as a byte is allocated wordlength storage. Presumably this ensures word-length boundaries for faster memory access with 16-bit processors. Unfortunately, if true byte alignment is needed, as in the file__length() function, you cannot use hLocal.
Also, there is one potentially inconvenient result of using the hProc and hLocal macros: the entire macro invocation must be on a single source line. If the passed parameter or automatic variable list is long, the code sticks out far to the right, which can cause problems when printing the source listing. All in all, this is a small penalty.
The heart of the macro library is the hProc and hLocal macros, and each needs explanation. The hProc for file__length() could be:
The cLang equate indicates which calling convention is to be used. If cLang is defined as shown above, the C language calling convention is used. If cLang is undefined, the Fortran/Pascal/Basic calling convention is used instead. Assuming a small-memory model, a near procedure is created and two equates are generated with the following meanings:
The Microsoft documentation warns that using these macros can slow down assembly. The reason is that hProc invokes 19 other macro calls. All these macro calls, however, make hProc quite capable. As mentioned earlier, it can handle C or Fortran calling conventions. The type ptr generates word-length pointers for small- and compact-memory models and doubleword-length pointers for medium-, large-, and huge-memory models. Timing the assembly of a small test file where the source and assembler were in a RAM disk, the increase was from two to four seconds. The source had two passed arguments and one automatic variable. Increasing the automatic variables to four added another second to assembly time. Although in relative terms this is a significant increase, in absolute terms it's scarcely noticeable and should not be a deterrent.
As already mentioned, I could not use the hLocal macro for file length(). I looked to see if this macro could be easily modified to add the ability to handle arrays and byte-length variables. The faint of heart should not try to modify this macro, as parsing a line with macros is bewilderingly complex. As I could not see how to do it, these two limitations remain.
Finally, even if you do not use the other macros from mixed.inc, you should use pLds and pLes. These two macros greatly facilitate writing assembly-language code that will assemble as written in any memory model. One line of code will handle both word- and doubleword-length pointers as required for the memory model in use. -- R.M.
_ARGUMENTS AND AUTOMATIC VARIABLES IN ASSEMBLY LANGUAGE_
by
Raymond Moon
[LISTING ONE]
Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.
Current Value Stack
PARM__1 [bp + 8]
PARM__2 [bp + 6]
PARM__3 [bp + 4]
saved ip reg [bp + 2]
saved bp reg [bp]
INT__1 [bp - 2]
INT__2 [bp - 4]
INT__3 [bp - 6]
Example 3: Automatic variable structure
_AUTO struc ; Auto var addressing struc
INT_1 dw ? ; 1st automatic variable
INT_2 dw ? ; 2nd automatic variable
INT_3 dw ? ; 3rd automatic variable
_AUTO ends
Example 4: Standard entry code
push bp ; Save the called bp
mov bp, sp ; Initialize new frame pointer
sub sp, size _AUTO
; Make room for auto vars
AUTO equ [bp - size _AUTO]
; Auto var addressing base
mov ax, AUTO.INT_1 ;AX = INT_1
mov sp,bp
pop bp
ret
A Complete Example
long file__length (filename, attribute)
/* length of the file in bytes or
-1L if the file is not found. */
char *filename; /* pointer to filename */
unsigned int attribute; /* file attribute */
length = file__length('filename.ext' NORM);
Example 5: #defines for file attributed to be used with file_length()
#define NORM 0x00 /* Normal file */
#define RO 0x01 /* Read-only file */
#define HID 0x02 /* Hidden file */
#define SYS 0x04 /* System file */
#define VOL_ID 0x08 /* Volume ID */
#define SUBDIR 0x10 /* Subdirectory */
#define ARCH 0x20 /* Archive file */
Table 4: Memory model effects on the PARMS structure
Memory Code Data Dup Operator Pointer
Model Size Size Parameter Size
Small <64K <64K 2 dw
Compact <64K >64K 2 dd
Medium >64K <64K 3 dw
Large >64K >64K 3 dd
Conclusion
Microsoft Mixed-Language Macros
* setModel--Sets the memory model from a text equate.
* hProc <name [NEAR]>"<USES reglist>] [,arg[:type]...--Starts a
procedure with the option of saving reglist and with optional passed arguments.
* hLocal var[:type] [,var[:type]]...--Defines automatic variables.
* hRet--Restores the stack and returns.
* hEndp--Ends the current procedure.
* ifFP statement--Conditional assembly based on memory model.
* FPoperand--Provides the es segment override for data in medium,
large, and huge memory models.
* pLes register, pointer--Loads a pointer into es and/or another
register depending upon the memory model.
* pLds register, pointer--Similar to pLes, but loads ds.
cLang = 1
hProc FILE__LENGTH, <USES ds,es>, FILENAME:ptr, ATTR
FILENAME equ [bp + 6]
ATTR equ [bp + 8]
1 page 60,132
2 title FILE_LENGTH - Determine File Length In Bytes
3 name FILE_LEN
4
5 comment @
6 FILE_LENGTH() V 1.00
7 -------------------------------------------------------
8 NAME
9 file_length() Determine file length in bytes
10
11 SYNOPSIS
12 length = file_length(filename, attribute);
13 long length; length of file in bytes
14 char *filename; path\filename.ext
15 unsigned int attribute; file attribute used
16 in search
17
18 DESCRIPTION
19 This function uses DOS function 4eh, Find First
20 File, to return with the filelength. So that this
21 function does not corrupt the current Data Transfer
22 Area, DTA, this function moves the DTA to its own
23 automatic variable area. If the file is not found,
24 -1L is returned.
25
26 This function allows any kind of file to be found by
27 specifying the file attribute. The following #define
28 statements can be used to define the various file
29 attributes:
30
31 #define NORM 0x00 /* Normal file */
32 #define RO 0x01 /* Read-only file */
33 #define HID 0x02 /* Hidden file */
34 #define SYS 0x04 /* System file */
35 #define VOL_ID 0x08 /* Volume ID */
36 #define SUBDIR 0x10 /* Subdirectory */
37 #define ARCH 0x20 /* Archive file */
38
39 The bit or operator, |, can be used to combine
40 several attributes. For example:
41
42 length = file_length("filename.ext", RO|HID|SYS);
43
44 All normal, read-only, hidden, and system files will
45 be searched for a match to "filename.ext."
46
47 This procedure was assembled using MICROSOFT MASM
48 V5.0 with the switches, /v /ml /z, set.
49
50 CAUTION
51 None.
52
53 RETURNS
54 File length as a long. -1L signifies an error.
55
56 AUTHOR
57 Raymond Moon - 18 Nov 87
58
59 Copyright (c) 1987, MoonWare
60 ALL RIGHTS RESERVED
61
62 HISTORY
63 Version - Date Remarks
64 1.00 - 18 Nov 87 - Orginal
65
66 @
67
68 ;----------------------------
69 ; Define the segments for the memory model so that
70 ; they can link with Microsoft C.
71
72 _TEXT segment byte public 'CODE'
73 _TEXT ends
74
75 _DATA segment word public 'DATA'
76 _DATA ends
77
78 CONST segment word public 'CONST'
79 CONST ends
80
81 _BSS segment word public 'BSS'
82 _BSS ends
83
84 DGROUP group _DATA, CONST, _BSS
85
86 ;----------------------------
87 ; Define the assume directive so MASM knows to what
88 ; the segment registers point
89
90 assume cs:_TEXT, ds:DGROUP, ss:DGROUP, es:DGROUP
91
92 ;----------------------------
93 ; Define the structures for passed parameters and
94 ; automatic variables.
95
96 PARMS struc
97 dw 2 dup (?)
98 FILENAME dw ?
99 ATTR dw ?
100 PARMS ends
101
102 ;----------------------------
103 ; _AUTO structure contains the following variables:
104 ; OLD_DTA Saved address of old DTA
105 ; NEW_DTA Start of NEW_DTA, this first field is
106 ; reserved for subseqent Find Next File
107 ; F_ATTR Attribute found
108 ; F_TIME Time file was last written
109 ; F_DATE Date file was last written
110 ; F_LENGTH Double word file length
111 ; F_NAME Found filename.ext
112
113 _AUTO struc
114 OLD_DTA dd ?
115 NEW_DTA db 21 dup (?)
116 F_ATTR db ?
117 F_TIME dw ?
118 F_DATE dw ?
119 F_LENGTH dd ?
120 F_NAME db 13 dup (?)
121 _AUTO ends
122
123 ;----------------------------
124 ; Define structures for addressing trhe offset and
125 ; segment portions of doubleword pointers and the
126 ; least and most significant words of long intergers.
127
128 DOUBLEWORD_PTR struc
129 OFF dw ?
130 SEGMT dw ?
131 DOUBLEWORD_PTR ends
132
133 LONG_INT struc
134 LSW dw ?
135 MSW dw ?
136 LONG_INT ends
137
138 ;----------------------------
139 ; Start the Code Segment.
140
141 _TEXT segment
142
143 _FILE_LENGTH proc near
144
145 public _FILE_LENGTH
146
147 ;----------------------------
148 ; Standard entry logic. Save calling BP. Set up new
149 ; BP. Define automatic variable addressing base,
150 ; AUTO. Save segment registers are saved as DS and
151 ; ES are used.
152
153 push bp
154 mov bp,sp
155 sub sp, size _AUTO
156 AUTO equ [bp - size _AUTO]
157 push ds
158 push es
159
160 ;----------------------------
161 ; Get and save current DTA address in OLD_DTA. Use
162 ; DOS service 2fh. The address is returned in ES:BX.
163
164 mov ah,2fh
165 int 21h
166 mov AUTO.OLD_DTA.OFF,bx
167 mov AUTO.OLD_DTA.SEGMT,es
168
169 ;----------------------------
170 ; Set up NEW_DTA as the current DTA. Use DOS service
171 ; 1ah. DS = SS assumed.
172
173 mov ah,1ah
174 lea dx,AUTO.NEW_DTA
175 int 21h
176
177 ;----------------------------
178 ; Set up for File First Find, Service 4eh.
179 ; CX = Search attribute
180 ; DS:DX => ASCIIZ string, d:path\filename.ext
181
182 mov ah,4eh
183 mov cx,[bp].ATTR
184 mov dx,[bp].FILENAME
185 int 21h
186
187 ;----------------------------
188 ; Check for Carry Flag set as that indicates that the
189 ; file was not found. If so, put -1L in F_LENGH so
190 ; -1L is returned as file length.
191
192 jnc FL1
193 mov ax,-1
194 mov AUTO.F_LENGTH.LSW,ax
195 mov AUTO.F_LENGTH.MSW,ax
196
197 ;----------------------------
198 ; Restore old DTA, service 1ah.
199 ; DS:DX => old DTA.
200
201 FL1: mov ah,1ah
202 mov dx,AUTO.OLD_DTA.OFF
203 mov ds,AUTO.OLD_DTA.SEGMT
204 int 21h
205
206 ;----------------------------
207 ; Set up AX:DX to return file length.
208
209 mov ax,AUTO.F_LENGTH.LSW
210 mov dx,AUTO.F_LENGTH.MSW
211
212 ;----------------------------
213 ; Exit logic. Restore segment registers. Remove any
214 ; automatic variables. Restore calling BP.
215
216 pop es
217 pop ds
218 mov sp,bp
219 pop bp
220 ret
221
222 _FILE_LENGTH endp
223
224 _TEXT ends
225
226 end
Example 1: Definition of strncpy()
char *strncpy(string1, string2, n);
char *string1; pointer to the first string
char *string2; pointer to the second string
unsigned int n; number of characters to be
copied
Example 2: needs caption
PARMS struc ; Passed arg addressing struc
dw 2 dup(?) ; Saved IP and bp registers
string1 dw ? ; Pointer to 1st string
string2 dw ? ; Pointer to 2nd string
N dw ? ; # char to copy
PARMS ends
Example 3: Automatic variable structure
_AUTO struc ; Auto var addressing struc
INT_1 dw ? ; 1st automatic variable
INT_2 dw ? ; 2nd automatic variable
INT_3 dw ? ; 3rd automatic variable
_AUTO ends
Example 4: Standard entry code
push bp ; Save the called bp
mov bp,sp ; Initialize new frame pointer
sub sp,size _AUTO
; Make room for auto vars
AUTO equ [bp - size _AUTO]
; Auto var addressing base
Example 5: #defines for file attributes to be used with file_length()
#define NORM 0x00 /* Normal file */
#define RO 0x01 /* Read-only file */
#define HID 0x02 /* Hidden file */
#define SYS 0x04 /* System file */
#define VOL_ID 0x08 /* Volume ID */
#define SUBDIR 0x10 /* Subdirectory */
#define ARCH 0x20 /* Archive file */