Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

.NET

MapMan: Building Windows Symbols Files


MAY95: MapMan: Building Windows Symbols Files

Rolling your own symbols for 16-bit Windows

Joe is a systems programmer at a major hardware vendor. He is a graduate of Georgetown University and currently lives and works in the Washington, DC area. He can be contacted at [email protected].


Almost every Windows programmer has wished for more symbols than those shipped with the Windows SDK. Some of you might have even needed to debug another application because it conflicts with your app. In this article, I'll present a tool that lets you build .SYM files for any 16-bit Windows executable, including the DLLs that make up Windows itself. I call this tool "MapMan," short for "Windows map-file manager."

MapMan runs on any 16-bit application for the Windows operating environment. I've used many Windows 3.1 binaries in my test suite, including Write, ProgMan, ClipBrd, Notepad, Krnl386, and User. MapMan also runs on Win-OS/2 2.x and 3.0 applications (and can even be used on 16-bit OS/2 executables such as EPM.EXE in OS/2 Warp). Although MapMan is currently a real-mode DOS program, I intend to port the application to Windows.

In this article, I'll refer to a number of Windows features--DOS executable headers, new executable headers, names tables, resident names tables, and the like--which you may or may not be familiar with. For your convenience, a discussion of these terms is provided electronically, along with the source code, executables, and related MapMan files; see "Availability," page 3.

As most Windows programmers are aware, any procedure that will be called external to the application must be exported. One such exported procedure is a wndproc (or window procedure), which is not called directly by the application, but rather by Windows. Exporting a function simply means adding it to a few internal tables of the NE header, making it accessible to any module, either by name or ordinal. This exporting process is much like adding a chapter title to the table of contents of a book; without the chapter title in the table of contents, it might be impossible to find the chapter by simply skimming the book. Even if you were successful, you would probably find your search very time consuming. Likewise, if Windows were not able to look up your application's wndproc in your module's list of exported functions, it would be difficult or impossible for Windows to call the procedure (to send it a message, for example).

You can export functions by placing the function names in the Exports section of an executable's module-definition (.DEF) file. Often, a compiler-dependent keyword can be used in a function definition (for example, _export) to export a function without a .DEF file entry. My source code uses the compiler-independent Exports-section method to define exported entry points. The sample .DEF in Example 1 contains exported functions, a Name field, a Description field, and an Exports section, each of which is part of either the resident or the nonresident names table in the NE header. The CODE and DATA keywords, of course, define the application's segments, found in the segment table in the NE header. And, yes, flags such as the EXE TYPE keyword as well as other keywords such as HEAPSIZE and STACKSIZE all resolve to sections of the NE header.

Map Files

A map file is an ASCII text file containing information that maps (or identifies) pieces of a module by symbolic value to addresses in the module's segments. Consider the map file in Figure 1, generated by the Microsoft 5.1 linker using object modules built with debugging information using the --Zi option. This map file is linker specific; other linkers may generate different files. (Table 1 provides an overview of the sections of a Microsoft linker .MAP file.)

At the top of the .MAP file, you'll see the module name (TRAPMAN in Figure 1). The second section of the .MAP file contains a description of the segments in the application. I've condensed it since the original divided this small application's two segments into over 30 pieces. The number to the left of the ":" is the segment number (in hex). This application has two segments: one of type CODE, the other of type DATA. The first segment has 2bf2h bytes (decimal 11250), which is 25d0h+622h (the start of the last section plus the length of the last section). The second segment has 0b3ah bytes (decimal 2874), which is 930h+20ah.

The third section of the .MAP file gives the DGROUP of the module. Normally, all code segments of an application are shared across multiple instances. Windows will only load one copy of the code and read-only data for any and all copies of the application in memory at any one time. In an application that uses the DATA MULTIPLE keyword in its .DEF file, however, each instance of the application will have its own private DGROUP segment (as the DGROUP is both readable and writable). This third section denotes which segment number (in hex) is the DGROUP. By Microsoft convention, the last segment in a module is the DGROUP.

The fourth section of the .MAP file is the list of exported functions found in this module. This application has two: one for the About box, another for the main window. Both are wndprocs and must be exported so that Windows can call them. Again, all offsets are in hex, so the About-box routine is actually 1564 bytes into the first segment of the application.

The fifth section of the .MAP file contains the public symbols sorted by name. Public symbols are those symbols known to the linker. In other words, a public symbol can be used within any executable module, not just the source file in which it is defined. In C, for example, functions normally have such external linkage.

This application was built with debug information, so the linker had much more information than it normally would to put in the .MAP file. The About-box routine is a Pascal routine, as required by Windows (the lack of a preceding underscore hints at this). The routine is in the first segment at offset 61ch.

Nothing about exporting a function, however, requires the Pascal calling convention. A CDecl function, for example, can be exported, but if you were to export a CDecl wndproc, Windows would be unable to call the function successfully. Windows normally assumes that exported functions have Pascal calling conventions, that is, that the function called will clean its parameters off the stack (usually with a RET N instruction). If the function is CDecl, it assumes that the caller will clear parameters from the stack. The stack will be left in an unstable state if the called function did not have a void parameter list. A few Windows exported functions are CDecl by necessity (wsprintf(), for example), as the Pascal calling convention doesn't support variable argument functions.

The next line contains 0:0 for an address. The null value denotes this as a far pointer requiring "fixup." At link time, the linker has no idea at what segment:offset value the MESSAGEBOX routine will be found; it knows only that MESSAGEBOX is in the USER module with ordinal 1. The Windows loader must replace occurrences of MESSAGEBOX in the application with the appropriate selector:offset to the function in memory.

The MYFARPROC and MYODS functions are actually assembly-language functions (all uppercase with no leading underscore) marked as public symbols with the PUBLIC keyword in the assembly source file.

The function _DPMIAllocateLDT-Descriptors is C code (note the leading underscore and use of case in the symbol name).

Lastly, notice the symbol __astart in the last line of this table. __astart is the actual entry point--the first piece of code executed by Windows when it launches a new instance of the application. As the double leading underscore indicates, this is part of the C run-time library for my compiler. Double underscores are used for public C-library functions to avoid name collisions with nonlibrary source code. Standard library functions in C are an exception; strlen(), for example, has only a single underscore in this table. The About-box routine also appeared in the list of exported functions. All exported functions can be thought of as public symbols, so the exports are also in this list.

The sixth section of the .MAP file contains symbols identical to those in section five, but sorted by address (although the section is called Publics by Value in the .MAP file). It's convenient that the linker gives it to us both ways. If you've broken into a debugger because your app has just trapped and you're staring at CS:IP=103f:0886, it's helpful to know that part of your .MAP file is sorted by address. If you're trying to find the segment and offset of one of your symbols to set a breakpoint by address in that same debugger, you'll appreciate the .MAP file being sorted by name. (Who said you can't have your cake and eat it, too?)

My version of MapSym, however, considers only the Publics by Value section essential. For reasons I haven't explored, removing the Publics by Name section makes the resulting .SYM file slightly smaller, and MapSym doesn't complain. Remove the Publics by Value section and MapSym will refuse to build a .SYM file, giving a message to relink the executable. Since the two tables should differ only in order and not in content, there's probably no reason for MapSym to look at both.

The seventh section of the .MAP file contains Program entry point at 0001:25E1. This segment:offset is that of the __astart() library function for this application. Our WinMain() is not called directly by Windows. It will be called by the C library code that, in this case, is the Windows entry point for the application.

The MapMan Program

MapMan was built with the Microsoft C 6.x compiler and generates .MAP files compatible with Microsoft linker .MAP files and the Microsoft MapSym utility used to create symbol files. To create symbol files for a different linker, you may need to modify MapMan's output to match your particular linker's output. This should not be time consuming, as the largest piece of work in MapMan consists of the functions specific to the NE header, which are compiler and linker independent (at least for our purposes).

The future Windows version of this application will reuse all source code except for that which is platform specific (in mapdutil.c). For this reason, I have my own TYPEDEFs for BOOL, WORD, and other standard Windows types (this is a DOS app, of course, and won't include any Windows header files). I also have wrapper functions around all calls to the standard C library, such as printf(). This is because the Windows version will not call printf(), but some other function instead (probably a file-system related function, but it may simply append to a buffer in memory). It won't be writing to STDOUT with printf(). I intend to use this DOS version of MAPMAN.EXE as the stub for the Windows version, making an app that will run in either DOS or Windows.

Our task is to create a .MAP file similar in form and function to that generated by a Microsoft linker and acceptable to the Microsoft MapSym symbol-file generator; see Figure 2. Since executables are binary files, you'll need to open the executable for binary read, then parse the MZ and NE headers, if any are found.

The overall flow of the MapMan executable is simple, as evidenced by the main() routine. First, any user-supplied arguments are processed. Then, if a name was given, you allocate a buffer and attempt to load a file by that name into the buffer for processing. Finally, you free the buffer and return to DOS.

The LoadExe() routine is no more complicated: It opens the file (as binary) and loads it into the previously allocated buffer. At this step, pBuffer (a pointer to the start of the allocated buffer) points to the beginning of the file. If the file is a valid Windows executable, then you'll find at the start of the buffer an MZ (old-style) executable header.

You call the SetMZ() function to set the pointer to the MZ header (pMZ) and validate the new pointer by checking its signature. If SetMZ() returns False, then the file loaded has no valid MZ header. It might be a .COM file or simply a data file. In any case, you can exit after warning the user.

If you have a valid MZ header, then you must also verify that you have loaded a valid Windows executable. You do this by verifying that an NE header exists after the MZ header. If the MZ header relocation-table address is less than 0x40 (64 decimal), then no Windows header exists. Once again, exit after warning the user.

Otherwise, call SetNE() to set and validate the pointer to the NE header (pNE) that you'll use for further processing. If this routine returns False, no valid NE header exists and you exit (after warning the user). If you do have a valid NE header, begin processing it to create the internal structures needed to build the map file.

Generating a Map File

Map-file generation is a two-stage process: First, you create internal representations of the structures that you'll need from the NE; then you build a .MAP file from them. The first stage of the process is found in the calls to BuildResidentNamesTable(), BuildNonResidentNamesTable(), BuildEntryTable(), and BuildSegmentTable(). The first three routines create and modify the list of entry points pointed to by pEntryHead. The last one creates a list of application segments pointed to by pSegmentHead. With just these two pointers, you'll have most of the information needed to build a .MAP file.

BuildResidentNamesTable(), the resident names table, is pointed to by the pResident pointer in the NE header structure. This pointer is based off the start of the NE header. You can say that the ResidentNames table begins pResident bytes after pNE. Remember, however, that the rules of C pointer addition require you to assign pNE to a pointer to char to add pResident to it.

It's convenient to parse the resident names table first, so that the module name needed to generate the .MAP file is the first element of our entry-point list. This module name has an ordinal value of 0; it does not have a corresponding entry in the entry table.

Windows applications often have no resident APIs. In such cases, the resident names table still exists; its first and only entry contains the module name of the exe- cutable. Remember that an API will show up in either the resident names table or the nonresident names table, but not both.

Unfortunately, any structure written to map a resident names entry is inherently unusable because there is a variable-length structure in the middle of it! The structure begins with a length byte, n, followed by n characters--a string (although it is not a valid C string because it is not terminated by a null character) that names the exported function. A word follows the name which gives the ordinal value (the number of the entry-table entry that corresponds to this exported function).

For each entry in the table, you create a new entry-point node by calling MakeEntryPointElement(). An entry-point node contains the following fields: a pointer to a null-terminated function name, an API ordinal value, a segment number, and an offset. Any value not yet known is set to INVALID_VALUE so that you do not use it by mistake. Currently, segment number and offset are invalid, as these values will not be known until we parse the entry table later. The resident names table ends when a length byte of 0 is found, and in this case the name and ordinal fields do not exist. The same holds for the other tables discussed in this article--a length byte of 0 represents the end of the current table, with none of the usual fields following.

There is only one nonstandard part to this function. We create a null-terminated API name by overwriting the first byte of the ordinal number in our buffer after first saving the value of the ordinal in our entry-point element list. Of course, this requires that the pointer saved also point to the first character of the string (the length byte must be skipped). Then you won't have to reallocate a second piece of memory to hold a new null-terminated character (or ASCIIZ) string. You need an ASCIIZ string so that standard-library functions in DOS can be called--those that take strings require their strings to be terminated with a null character.

BuildNonResidentNamesTable(), the nonresident names table, is pointed to by the pNRes pointer in the NE header structure. The BuildNonResidentNamesTable() function is nearly identical to BuildResidentNamesTable(). The only difference is that the nonresident-names-table offset is based on the start of the MZ header (that is, the start of the executable) to facilitate easy access. Once the operating system has saved the pNRes offset to the table, it can reload the table by opening the file again and seeking for pNRes bytes. In this case, there is no need to find the NE, as the pointer goes directly to the table you want.

BuildNonResidentNamesTable() also adds entries to the entry-point list. Once this function completes, all exported entry points in the Windows module have been placed on the entry-point list with their ordinals. The first entry in the nonresident names table is the module description. Nonentry points, like module name and module description, have a 0 ordinal and will not have an entry in the entry table.

BuildEntryTable(), which builds the internal representation of the entry in the entry table, is pointed to by the pEntry pointer in the NE header structure. The pEntry pointer is based on the start of the NE header. When an application exports a function to another module, this entry table associates the name and number of the exported function to an actual segment:offset in the exporting module. The NE header also has the cbEntry field, which gives the size in bytes of the entry table. The first entry in the entry table corresponds to application ordinal 1. All entry-table elements are ordered sequentially by ordinal.

An entry-table entry can consist of various combinations of four structures. It begins with a single byte, which is the count of records in this entry (or 0 at the end of the table), and is followed by a byte that describes the records in this entry. This second byte can have several meanings. If the second byte is 0, then the first byte is a count of entries to skip, so this count must be added to the current entry number to calculate a new entry number. This optimization reduces the size of the entry table by replacing anywhere from 1--255 entries with only two bytes. The VGA.DRV that comes with my copy of Windows 3.1 has several such skip counts in its entry-table entries.

The second reserved value is 0xFE (254 decimal), which denotes that this entry is a data value. Such data values must be extracted via GetProcAddr() and will not be available via symbolic name.

The last reserved value is 0xFF (decimal 255), which marks this group of entry records as belonging to a movable segment. A movable entry-table entry contains a byte of flags, an int 3fh instruction, the actual segment number, and the offset of the particular entry.

Any other values in the second byte give the segment number of the segment containing these elements. This segment is a fixed (not movable) segment, and the entry structure contains only flags and an offset.

Once you have extracted the information for the current entry, you call GetEntryPointElement() to find the name associated with the current entry-table index (remember, this is the same as the API ordinal). You then update the entry-element list with the entry-point segment:offset from the current entry-table record. Continue processing entry-table records until the count is 0, or until you've read more bytes than the cbEntry field in the NE header. Note that the count of records is often greater than 1 (a single grouping can have multiple records, depending on how the ordinals are laid out).

BuildSegmentTable() builds a segment table for the .MAP file. To do this, you need to walk the segment table in the NE. This table is pointed to by pSegment (based on the start of the NE header) and has a size of cbSegment*8. Each segment record has four fields: offset, length, flags, and minimum size, and there are cbSegment segment records in the segment table.

You allocate a block of memory large enough for the segment table, and the pSegmentHead variable points to the start of that block. You then walk the block by considering the pointer to point to an array of cbSegment segment records.

Building a .MAP File

The process of constructing a .MAP file is found in BuildMapFile(). There's no real conceptual complexity, but the demands of Microsoft's MapSym symbol-file generator complicate the process. For example, MapSym is unable to find the entry point of a program if the .MAP file used does not contain the text "Program entry point at" at the beginning of a line. Likewise, a section with the heading Publics by Value must be in the .MAP file or MapSym will not accept the .MAP file as valid for symbol-file generation. In the interest of simplicity, MapMan does not sort the Publics by Name or Publics by Value sections of the .MAP file.

As Table 1 shows, the map file requires the following sections in a specific format: module name, segment table, DGROUP, exported entry points, public symbols by name, public symbols by value, and application entry point. Figure 3 is an example .MAP file generated by MapMan for the application whose .MAP file appears in Figure 1.

To generate .MAP files for other compilers, you'll need to update five functions: four in mapllist.c (DumpEntryPointList(), DumpSegmentList(), DumpPublicsByName(), and DumpPublicsByValue()), which refer to private data structures found in that module; and the main function BuildMapFile() in mapmkmap.c.

Conclusion

I've found MapMan to be a useful Windows debugging tool by itself. But MapMan also lets you modify the generated .MAP file to add new symbols for use in your .SYM file. Just be sure to put your new symbols and addresses in the Publics by Value section if you want MapSym to see them.

For example, I took hExeHead and CurTDB from the "THHook" section of Undocumented Windows, by Andrew Schulman et al. (Addison-Wesley, 1992), and added them to a .MAP file I generated for KRNL386.EXE with MapMan. You know from the exports table that "THHook" is at 4:0218. From Undocumented Windows, you find that hExeHead=THHook+0x04 or 0x21c and CurTDB is at THHook+0x10 or 0x228. Consequently, I added these lines (again, to the Publics by Value section) to my map file, recompiled with MapSym, and ran my debugger.

0004:021C myhExeHead
0004:0228 myCurTDB

I could then dump hExeHead and CurTDB with complete symbols, using the names that I had given in my generated .MAP file.

Example 1: A sample .DEF file.

NAME             TRAPMAN
DESCRIPTION      'J. Hlavaty:  Windows GP handler for debugging'
EXETYPE          WINDOWS

PROTMODE

CODE             LOADONCALL NONDISCARDABLE
DATA             PRELOAD    MULTIPLE

HEAPSIZE         1024
STACKSIZE        8096
EXPORTS
                 MainWndProc   @1
                 About         @2

Table 1: Microsoft linker .map file sections. Spacing, case, and terminology in many of the section headers are mandatory. MapSym rejects the .MAP file if it cannot find the string "Publics by Value", for example with an obscure "No public symbols" error message.

<b>Section Contents   </b>
1         Module name of executable
2         Segment table
3         DGROUP
4         Exported entry points
5         Public symbols by name
6         Public symbols by value
7         Application entry point

Figure 1: A sample .MAP file.

 TRAPMAN
 Start     Length     Name                  Class
 0001:0000 013AEH     TRAPMAN_TEXT           CODE
 0001:13AE 001A4H     DPMI_TEXT              CODE
 0001:1560 01069H     HANDLER                CODE
 0001:25CA 00000H     TRAPDATA_TEXT          CODE
 0001:25D0 00622H     _TEXT                  CODE
 0002:0000 00000H     DATA                   DATA
(portions removed to conserve space)
 0002:0930 0020AH     c_common               BSS

 Origin   Group
 0002:0   DGROUP

 Address   Export                  Alias

 0001:061C About                   About
 0001:0322 MainWndProc             MainWndProc

  Address         Publics by Name

 0001:061C       About
 0000:0000  Imp  MESSAGEBOX           (USER.1)
 0001:16C2       MYFARPROC
 0001:2595       MYODS
 0001:1422       _DPMIAllocateLDTDescriptors
 0001:25E1       __astart

  Address         Publics by Value

(removed to conserve space)

Program entry point at 0001:25E1

Figure 2: The .Map-file creation process.

1. Load header information of executable into memory or exit (on failure).
2. Verify valid MZ header or exit.
3. Verify valid NE header or exit.
4. Build internal representation of resident names entries.
5. Build internal representation of nonresident names entries.
6. Add entry-table information to internal representation from Steps #4 and #5.
7. Build internal representation of segment table.
8. Write .MAP file format to STDOUT (can be redirected to a file).

Figure 3: MapMan-generated .MAP file for the same application as Figure 1.

 TRAPMAN
 Start     Length     Name                 Class
 0001:0000 02BF2H     Seg1_TEXT              CODE
 0002:0000 00B3AH     Seg2_DATA              DATA
 Origin   Group
 0002:0   DGROUP
 Address   Export                  Alias
 0001:0322 MAINWNDPROC             MAINWNDPROC
 0001:061C ABOUT                   ABOUT
 Address         Publics by Name
 0001:0322       MAINWNDPROC
 0001:061C       ABOUT
 Address         Publics by Value
 0001:0322       MAINWNDPROC
 0001:061C       ABOUT
Program entry point at 0001:25E1


Copyright © 1995, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.