Design

Object File Formats

By Rand Gray and Deepak Mulchandani, May 01, 1997

Object files provide a concise and efficient representation for a compiled application, furnishing all the information needed to represent the state of the entire application at a point in time. Our authors examine object files in general, and explore the structure of the COFF and IEEE695 object file formats in particular.

Dr. Dobb's Journal May 1997: Embedded Systems

Deepak, an engineer for Sun Microsystems, can be contacted at [email protected]. Rand is a design engineer for Zilog and can be reached at [email protected].

Have you ever had a debugger return a "No source line debugging information available" message? Have you wondered why applications created with one vendor's toolset don't work with another's? Or have you ever thought about why the size of your compiled application shrinks when you don't compile with debugging information on?

These questions involve object file formats that are typically generated as a result of a compilation or assembler translation. Object files provide a concise and efficient representation of a compiled application, and they encompass all the information needed to represent the state of the application at a point in time. Object files normally encapsulate information such as source-line-address translations, variable type and scoping information (for high-level languages), and raw instruction code.

In this article, we'll examine object files in general, and explore the structure of the COFF and IEEE695 object file formats in particular. The information we present here will provide you with the basic information required to construct your own object file manipulation tools.

Characteristics of Object Files

An object file is usually created by running a compiler or assembler on a source file. It is a concise representation of all attributes of an application and provides efficient access to that information. Optionally, the object file also contains debugging information for loading into a debugger.

Figure 1 shows a high-level description of the steps involved in compiling a source program. Phase A is where the compiler outputs the generated assembly code; Phase B is where the corresponding object file is generated, based on the input assembly code. Phase A is the stage where information is passed between the compiler and assembler, and all information is passed using the output assembly-code file. Therefore, the compiler must include all the information from the original source file, along with extra information that it deduces from scanning the source file. Now, what kind of information must be passed between stages? That usually depends upon the "attributes" of the data contained within a source file.

Attributes are basic properties that can be derived by scanning a line of source code. For instance, from the line int x; in Figure 2(a), you can deduce that the programmer wants to declare a variable of type integer and wants to call it x. Similarly, other attributes that might be found by scanning a source file are expressions, loops, and flow-control changes.

Figure 2(b) shows the attributes of the source lines. Some lines in the source program don't actually translate to assembly code (in fact, only lines 5, 10, and 12 relate to code generation). The rest can be translated to assembler directives, such as memory allocation for the variable x (line 8), which usually will be ds or rmb, depending upon the assembler and conventions of the toolset.

Notice in line 8 of Figure 2(a) that the address attribute is undefined. Usually, the step following the assembly process is the linking process, whereby multiple object files are merged into one. That's where references like these are resolved and an address is assigned to the variable. As a result of the "link," one object file is created, and all the attributes from all object files are written into it. In cases where only one source file is being compiled, the link step can be bypassed, since the absolute addresses can be determined by the assembler.

Common Object File Format

The Common Object File Format (COFF) is a simple format used by UNIX System V. Listing Five ontains a tool for dumping the contents of a COFF file.

As Figure 3 illustrates, the layout of the information within a COFF file is fairly straightforward. The object file is partitioned into five different regions.

The File Header contains useful information that can be used by a reader to verify the correctness of the object file. Listing One defines the filehdr structure for COFF file headers. The magic number (the f_magic field) is used by the reader to determine if the file has the correct format and architecture. The f_timdat field contains the time and date of creation of this object file. The debugger can compare this value with the time and date of modification of your source file to see if you've made any recent changes. You may have received the message "Object file is out of date. Please recompile." when debugging an application.

The Optional File Header follows the same structure as the File Header. It contains more detailed information. Since COFF is the binary standard for the UNIX System V environment, the operating-system loader must be able to load the object file into memory and execute it. Therefore, the optional header is used to tell the operating-system loader where the text, data, and bss segments reside in the object file so the loader can quickly seek there and load the data into memory (see Listing Two). The entry field specifies where the compiled application begins execution; that is, after executing the startup code, flow-control is transferred to this address.

The String Table contains strings that are longer than some predetermined length (usually 8). The Symbol Table contains type information for variables, filenames, and pointers to the strings in the String Table. The Section region contains machine code that must be loaded at the appropriate place in memory, and line-to-address translation information (if the application was compiled with debugging information).

COFF is fairly extensible. If you are writing an application that generates COFF files, but would like to include more information of your own, then all you have to do is define a new Section type. Note, however, that if you embed your own information into a COFF file, you must also be able to extract the information correctly and make the appropriate changes to all the offsets within the object file.

The Section region also contains relocation information. Relocation information is generated for a section if it contains incomplete information. When the linker merges all the different object files into one, it tries to fill in all the missing information; anything it can't resolve gets printed as a "link error."

Sections are used to encapsulate information such as machine code, initialized data, relocation information, and line-number information. You can also define your own custom sections to include your own data in the object file. However, care must be taken to ensure that, if you embed custom sections within the object file, the corresponding reader of the object file knows about the new section type. Listing Three defines the section structure.

COFF defines similar structures for other information stored in the object files. The other structure definitions are for the optional file header, symbol-table entries, relocation entries, and line-number information. Using the structure definitions and file seek operations, it is fairly easy to scan interesting portions of a COFF file.

To illustrate how you can process COFF files, we generated an object file using the Motorola MCUasm Assembly Language toolset (which uses COFF as its object file format). Using Listing Four, you can disassemble certain parts of the MCUasm-generated COFF file. Because COFF is a binary format, when reading it (if you're using the C programming language), you should use the rb mode to open the file and process all the information using fread and fwrite.

Listing Four is admittedly elementary, but it shows how different regions of the object file relate. Listing Four is an assembly-language program written for the Motorola 68HC08 microcontroller. RAM begins at memory address $50 of the memory map, and ROM begins at memory address $6E00 of the memory map. The object file created is test.o. In the compilation options, we chose to generate line-number information. Figure 4 is the output from Listing Four. References to all the labels and variables are defined in the original application file t.asm (source file), var1 (variable declaration), and start, loop, and equal (labels). Similarly, we can process the information located in the different sections of the object file. All that is needed is the structure definitions and the correct order of statements required to retrieve and correlate the information.

Correlation is important since, when you read the line-number information from the object file (for example), the filename is not stored. Instead, the filename of the source file has to be read from the symbol table and associated with the line-number information by the developer of the reader utility.

Note that in the COFF file we have just disassembled, although the format of the data in the object file is the same, the data may be represented differently. For example, when writing the machine code to be loaded in the program memory, care must be taken since the toolset might generate Big endian or Little endian data and the representation on your target architecture might be different. The Motorola MCUasm toolset generates byte streams for the machine code, thereby eliminating the need for "glue logic" to convert between the two different representations of data.

IEEE695

While COFF is basically defined as a portable format for binary applications on UNIX System V, IEEE695 primarily exists on a variety of native and cross-development platforms, including the Motorola 68000 (Microtec Research), Motorola 68HC08 (BSO/Tasking), Hitachi, and Zilog processors.

Although COFF and IEEE695 maintain similar overall structure in terms of file format, IEEE695 has a more powerful internal representation of data. COFF and IEEE695 are both binary format (although IEEE695 also has an ASCII definition).

From a high-level perspective, the IEEE695 object format has nine separate "parts": a header, seven data regions, and a trailer.

However, according to the IEEE695 specification, the different parts of the object file will not always appear in the order presented in Figure 5. They could be generated in any order depending upon the "writer" of the object file. Thus, when scanning the object file, always retrieve the pointer to the correct section and then seek it. The header will always contain the pointers in the order shown in Figure 5. Here are some common examples of how debuggers use the information contained in the object file to support application debugging:

Header. The header part is just like the header for a COFF file. It contains seven pointers to the different sections (to be listed shortly). Like a COFF object file, the header part also contains information such as module name and the processor for which it has been compiled (magic number).
AD Extension. This part contains information regarding the creation of the file itself. For example, it contains information such as object format, version number, format type, and the memory model for which the application is compiled.
Environment. This section contains information regarding the development platform on which this object file is created--information such as creation date and time, environment in which the object file was created (UNIX, MS-DOS, VMS, and so on), and version number of the toolset used to create this object file.
Debug Information Definition. This section is where all the debugging information pertinent to the compiled application is maintained. Compared to COFF, IEEE695 has a good structured representation for the debugging information. The debugging information is distributed across "blocks," which are nested in a specified way. The nesting of the blocks in some ways represents the nesting structure of high-level language constructs (functions, local variable definitions, flow-control statements, and the like). Some of the blocks' definitions are: BB1, type definitions local to a module; BB2, type definitions global to all modules; BB3, module definition; BB4, a global subprogram; and BB5, source file line-number block definition.
Data. The data part contains information on relocatable and fixed entries for the object module. Constant data is represented in this part using an LD record type (Load Data), which specifies the number of MAUs to be loaded as constant data.

We didn't document the External, Section, and Module End parts of the object file format. Writing a quick and dirty processor for a IEEE695 object file is a little bit trickier than writing one for COFF. That's because, unlike COFF, IEEE695 files have a more complex representation for data -- they use record types and relationships between record types to represent data. COFF only defines structures such as section, linenum, filehdr, and opthdr. However, when you decide to write an object file tool for IEEE695, you'll have to look closely at the relationships of all the sections of the file so you are able to extract all the correct information and make the correct translations.

What information is required in an object file to support debugging? Normally, if you compile your application for debugging and then compile it for execution, you will notice the size of the executable created decreases. This decrease is more noticeable in larger applications than in smaller ones (since the attribute list is larger for the larger application).

The reason is that the underlying machine is not interested in what type a variable is. It is only interested in loading, manipulating, and storing data at memory locations. But, when debugging operations are to be supported, the following information must be written into the resulting object file:

Line-number information (source lines to underlying assembly code).
Variable type information (address, name, type, and size in bytes).
Symbolic information (the names of variables assigned to addresses).
Function information (name of function, where it begins, ...).
File information (to help locate which information came from which file).
Scoping information (information to help understand nesting of functions).

As you can see, a lot of extra information is written to the object file when debugging information is desired. Using this information, a debugger can provide fairly good support to the developer regarding the state of an application.

In developing a manipulation tool that needs to keep track of this information, a centralized symbol table with a query function usually works quite well. Since line-number information tends to be quite large, correlating each line number to an address and being able to look up an address quickly, based on where the processor stopped, is important. Using the symbol table also provides a feasible mechanism to extend the amount and format of the data you're storing in there:

Highlighting source lines. This information is not usually written, and debugging operations can be performed using just a subset of the aforementioned items. For example, using just line-number information and file information, you can support correlation of line-number information to addresses in memory. So, while users are executing and stopping their application under control of the debugger, depending on which address they stop at, you can determine the source line location and highlight it.
Out of date object file. If your debugger has ever made you recompile your application after you had changed the source, it's because the debugger can no longer make a correct translation between the source lines and the old line-number information in the source file. This can be performed using a simple stat of the source file to get its modification date.
Variable type information. The processor only understands a variable declaration in terms of a memory location allocated for it. The object file and debugger keep track of the variable address, size (in bytes or words), scope, and type. Once the processor is halted, a variable is being watched (in the variable watch window); first, the scoping is checked to see if the variable value is "live" at this point. If it is, the new value is read in terms of bytes (or words), then printed, based upon the type of the variable, into the variable watch window.

Trends in Object File Formats

Over the years, many different object file formats have become popular. Often, one format comes into existence due to weaknesses in another. In recent years, the ELF/DWARF format for describing object files has become quite popular. ELF (Executable and Linking Format) resembles the architecture of a COFF file (in fact, it defines relocation entries, sections, and a symbol table). In combination with the ELF structures, DWARF defines the debugging information contained in the object file. The ELF/DWARF standard make a very powerful combination, and support for this object file format is rapidly increasing across architectures.

The industry trend is to move toward one common object file format for description of compiled applications. COFF and IEEE695 are just two fish in a giant sea of defined file formats, but they are very popular. With definitions available for everything from the Motorola embedded PowerPC microcontroller to the Sun SPARC architecture, ELF is becoming very popular as a description language for object file formats.

Getting to know the structure of your object file does provide useful information regarding the toolset you use and the machine for which your application is being compiled. It also enables you to extract whatever useful information you require and to write your own custom manipulation tools.

DDJ

Listing One

struct filehdr {unsigned short f_magic; /* the magic number of the object file */
unsigned short f_nscns; /* number of sections contained in this object file */
long f_timdat; /* information regarding when this object file was created */
long f_symptr; /* pointer to the symbol table */
long f_nsyms; /* number of symbol table entries */
unsigned short f_opthdr; /* size of the optional header */
unsigned short f_flags; /* flags used for compilation */
};

Back to Article

Listing Two

struct opthdr{ 
short magic; /* Magic number */
short vstamp; /* Version stamp */
long tsize; /* Size of text in bytes */
long dsize; /* Size of data in bytes */
long bsize; /* Size of .bss in bytes */
unsigned long entry; /* Entry point */
long text_start; /* Base address of text */
long data_start; /* Base address of data */
} ;

Back to Article

Listing Three

struct scnhdr {
char s_name[8]; /* name of the section */
long s_paddr; /* physical address where this section' data is located */
long s_vaddr; /* virtual address where this section' data is located */
long s_size; /* size of this section' data (in bytes) */
long s_scnptr; /* pointer to this sections data */
long s_relptr; /* pointer to this sections relocation data */
long s_lnnoptr; /* pointer to the line number information for this section */
unsigned short s_nreloc; /* number of relocation entries */
unsigned short s_nlnno; /* number of line number entries */
long s_flags; /* flags indicating status of information written */
};

Back to Article

Listing Four

;; Example program used to generate example COFF object file using;; the Motorola MCUasm Assembly Language Toolset.
;; The example program defines 1 variable
;;
org     $50
var1            rmb     1         ; define dummy variable #1 (size 1 byte)
                org     $6e00
start:          lda     #$ff    ; load hex value 0xff into accumulator
                clr     var1    ; clear the loop variable 
loop:           cmpa    var1    ; compare if the two values are equal 
                beq     equal   ; the two values are equal - exit loop
                deca            ; not equal - decrement accumulator
                bra     loop    ; not equal - continue looping
equal:          bra     start   ; start all over again

Back to Article

Listing Five

/************************************************************************//* Program Module : COFF.C                                              */
/*   This program contains the "simple dump" utility for COFF object    */
/*   files. It was compiled using GCC on UNIX (SunOS 4.1.3)             */
/*  Written By : Rand Gray and Deepak Mulchandani                       */
/************************************************************************/


</p>
#include        <stdio.h>
#include        <string.h>
#include        <stdlib.h>
#include        <malloc.h>


</p>
#include        "linenum.h" 
#include        "syms.h"
#include        "coffio.h"
#include        "coff.h"
                       
/* --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  -- -*/
/* Global Variable declarations                                            */
/* --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  -- -*/
FILHDR          coffHdr;       /* structure for the COFF file header */
SCNHDR          sctHdr;        /* structure to contain a section header */
FILE            *coffFilePtr;  /* file pointer to the COFF object file */
                      
/* --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  -- -*/
/* Function : main                                                         */
/* --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  -- -*/
void main(int argc, char *argv[])
{
    if ( argc == 2 )
    {
        coffLineToAddr ( argv[1] );
        exit ( 0 );
    }
    else
    {
        /* if they didn't enter the object file name, then print */
        /* the usage and exit.                                   */
        fprintf ( stderr, "\nError : Object File Not Entered\n" );
        fprintf ( stderr, "Usage : ddjcoff object-file-name\n" );
        exit ( 1 );
    }
}                                          
/* --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  -- */
/* Function Name : coffcmd                                              */
/*              This function reads in a COFF format object file        */
/*              It reads it all into a structure called "mapTrans"      */
/*              which is shared by the MAP object file reader           */
/* --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  -- */
short coffLineToAddr(char *fileName)
{
    coffFilePtr = coffOpenFile(fileName);
    if (coffFilePtr != NULL)
    {                                 
        printf ( "\n\nExample Print Utility for COFF Object Files\n" );
        printf ( " --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  -- -\n\n" );
        printf ( "Contents of File Header\n\n" ); 
        
        readCoffHdr(coffFilePtr);   
        coffReadSymbols();


</p>
        fclose ( coffFilePtr );


</p>
        printf ( "\n\n" );
        printf ( " --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  -- -\n\n" );


</p>
        return ( 0 );
    }
    return ( -1 );
}
/* --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  -- -*/ 
/*   Open the COFF file to be read and return a pointer to it.           */
/* --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  -- -*/
FILE *coffOpenFile(char *coffFileName)
{
    FILE *tempPtr;
    /* As mentioned, COFF files are in binary format. Therefore we need */
    /* to open the file as a binary stream.                             */
                                                                      
    tempPtr = fopen(coffFileName, "rb");
    return (tempPtr);
}
/* Read in the COFF file header. Tells us: 
 *        - The Size (in bytes) of the Optional Header
 *        - The Number of Sections in the Object File
 */
void readCoffHdr(FILE *coffFilePtr)
{
  extern char   *ctime();   
  AOUTHDR   coffAoutHdr; /* structure for the auxiliary file header */
  fread( &coffHdr, sizeof(FILHDR), 1, coffFilePtr );
  /* Show a little test of how to determine if the object file is of    */
  /* the correct architecture. Just compare the magic number received   */
  /* against the correct magic number. The Motorola MCUasm toolset uses */
  /* the magic number (decimal) 808 when generating object files for    */
  /* the Motorola 68HC08 microcontroller.                               */


</p>
  if ( coffHdr.f_magic == 808 )
  {
   printf ( "\tMagic Number             : %d\n", coffHdr.f_magic );
   printf ( "\tNumber of Sections       : %d\n", coffHdr.f_nscns );
   printf ( "\tTime of File Creation    : %s\n", ctime ( &coffHdr.f_timdat ) );
   printf ( "\tNumber of Symbol Table Entries : %d\n", coffHdr.f_nsyms );
   printf ( "\n" );


</p>
    /* read in the auxiliary file header. For our example program */
    /* we really don't need this information. However, if needed  */
    /* you can easily print it using the scheme for the header above */
    fread ( &coffAoutHdr, sizeof(AOUTHDR), 1, coffFilePtr );
  } 
  else
  {
    fprintf ( stderr, "\nError : Invalid Magic Number\n" );
    exit ( 1 );
  }
}
/*-FH --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  -- */
/*  Function : coffReadSymbols                                             */
/*-EH --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  --  -- */
void coffReadSymbols()
{   
   SYMENT       se;
   int          index;
   fseek(coffFilePtr, coffHdr.f_symptr, 0);


</p>
   index = 0;
   while ( index != coffHdr.f_nsyms )
   {
     fread ( &se, sizeof(SYMENT), 1, coffFilePtr );
     if ( se._n._n_name )
       printf ( "\tSymbol Table Entry #%d = %s\n", index+1, se._n._n_name );
     index++;
   }
 }

Back to Article

1 2 3 4 5 6 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Design

Object File Formats

Characteristics of Object Files

Common Object File Format

IEEE695

Trends in Object File Formats

Listing One

Listing Two

Listing Three

Listing Four

Listing Five

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Design Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Design

Object File Formats

Characteristics of Object Files

Common Object File Format

IEEE695

Trends in Object File Formats

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Design Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content