Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Tools

A Source Code Profiler


OCT92: A SOURCE CODE PROFILER

Keith is the general manager of Computer Maintenance Organization in Tallahassee, Florida and is a contributing author to Tricks of the Windows 3.1 Masters (Howard Sams, 1992). He can be reached on CompuServe at 75230,2070.


The 90/10 rule is a well-known rule in programming that says the users will spend 90 percent of their time using 10 percent of your code. Knowing this, you should spend most of your time optimizing that code, but how do you avoid spending 90 percent of your time just identifying it? This problem is especially apparent in highly complex or recursive algorithms -- you cannot easily determine how often functions are used. Without special hardware or use of assembly language, how can you identify the code that really needs to work faster?

Fortunately, this is a well-studied problem, and tools called "profilers" exist to help you locate which functions are being used. A number of profilers are available (Microsoft's Source Profiler, Borland's Turbo Profiler, and Watcom's Profiler and Sampler all spring to mind); some are bundled with compilers, others are not. However, effective profilers are not always efficiently used. In this article, I'll discuss ways you can make the most out of a profiler, using as an example a working profiler you can build yourself. While most of my discussion focuses on the design and implementation of my profiler, many of the techniques governing the program's use can be extended to other profilers.

I've used my profiler on 286/386/486 systems under DOS 3.3, 4.01, and 5.0. It makes use of the hardware clock interrupt (INT 8) which executes a service routine 18.2 times per second. The profiler uses this interrupt to periodically determine where your program is currently executing.

The program was originally written using Microsoft's C 5.1 compiler and has been tested with C 5.1, C 6.0, and QuickC 2.0. It should work with little or no modification in other environments, provided that the extended keywords far and interrupt are supported. It may cause problems in systems using diskcaching programs, and it will not run under Windows. In order to generate useful reports, your linker must be able to create MAP files that supply the segment and offset address of the functions in your program in a format similar to that created by Microsoft's linker.

How the Profiler Works

The profiler is divided into three parts: PROFILER.C (Listing One, page 128); SPEEDUP.C (Listing Two, page 128); and PROF.C (Listing Three, page 128). Listings One and Two contain the runtime functions executed by your program. Listing One contains an interrupt service routine (ISR) which traps interrupt 8, the system timer interrupt. When a timer interrupt occurs, this routine compares the return address on the stack to the region of code being profiled. If the address is not in the region being profiled, no further action is taken. Otherwise, the address is scaled to the size of the buffer, and the word in the buffer corresponding to that address is incremented. The output of the profiler is essentially an array of unsigned integers. Each word indicates the number of times the program was found executing code at an address corresponding to the position of the word in the array. (For more details on the IBM timer, see the accompanying textbox, "Hardware Notes.")

Listing Two contains a function called speedup(), which is used to increase the number of interrupts received per second. This code was originally developed on an 80286/12-MHz processor. At the time, 18.2 samples per second was sufficient. Today I use an 80486/50-MHz system. Needless to say, my programs run a lot faster; therefore, I need many more samples per second. This function would be better coded in assembly to reduce the overhead of the timer interrupt. However, my intention is to provide the entire profiler using the C language.

The last part of the program reports on the data output by the writemon() function. It uses the MAP file output by your linker and a few additional bytes of information to determine what functions were executed. It accumulates the time used by each function and outputs them in sorted order.

Several problems came up while I was writing this profiler. A number of these occurred simply because the profiler intercepts the timer-interrupt. The timer-interrupt function can be running at any time, and any attempt to change variables used by the interrupt handler can cause a "race" condition -- a condition where executing two processes produces different (and possibly incorrect) results, depending on which process finished first. Consider the case in which a buffer address needs to be changed. The 8086 CPU does not have an uninterruptible instruction that could transfer four bytes to memory. The assignment of the buffer address takes at least two, and possibly several more instructions. If, during this process, the timer-interrupt handler is activated, the pointer to the buffer would be half right. That is to say, it would contain the segment address of the new buffer and the offset address of the old buffer, or some other equally bad combination. The pointer could address memory that is not a part of either buffer, or even part of your program. The solution is to disable any profiling when these variables are being changed. This is the purpose of the global variable status in Listing One.

Further problems arise when speeding up the system timer, Fortunately, I happened across an assembly routine by Alan Holub from his "C Chest" column (DDJ, December 1987). I translated his code into C and added this speedup() function to the profiler for this article. I separated the speedup() function from the rest of the program for two reasons. You can disable the speedup() feature of the profiler by including an empty speedup() function in your program. Furthermore, I think I will find speedup() useful in other programming projects (and hopefully you will as well). Passing speedup() too large a value for your system can cause your system to hang because of lost interrupts. Values up to 1024 seem to be acceptable for most uses; larger values produce too much overhead and really slow down your system; even larger values will hang it.

The next problem was resolving function addresses. The 8086 CPU uses a segmented architecture in which several different values can actually be used to refer to the same address. My profiler is not intended to support protected mode programs although it could with major modification, Fortunately, converting a segmented address into a linear (or physical) address is not difficult. The linear() function performs the same math that the 8086 CPU uses to convert a SEGMENT:OFFSET address into a physical address.

The last major design problem I had to solve was relocation. In order to identify the function being executed, the reporting program must know where your program was loaded in memory. The profiler assumes that the program will be loaded in a contiguous block of memory, and that all functions in your program will have the same position in relationship to the start of that block, no matter where the code is loaded. If these conditions are met, you only need to know the location of one function in the program. Since nearly all C programs have a main function, its address is used as the reference. Microsoft's linker outputs virtual function addresses in the MAP file. These addresses would only be correct if your program were loaded at physical address 0:0. However, the relationship between the function addresses found in the MAP file and the actual addresses in memory are the same.

Using the Profiler in Your Application

To initialize the profiler, simply call profile(buf, len, first, last, speed). buf is an array of len unsigned integers that should be initialized to 0. The profile() function does not initialize buf for you. You could take advantage of the fact that buf is not reset to 0 to accumulate profiling data over several program sessions. If you do this, look out for overflow errors caused by accumulating more than 65,535 hits at one code address. Overflow could have been checked using the _profmon() function, but I've decided to let it occur silently. You can, however, modify _profmon() to check for overflow. If overflow occurs, you have several choices: You could simply increment the next or previous word in the buffer (but what if that overflows?); you could generate an error message; or you could generate a table of overflowed addresses. Remember that anything done in _profmon() must be done fast and cannot use most DOS or runtime routines because it is executing at interrupt time. You can reset the profiler by passing it a NULL buffer pointer.

The first and last addresses given to the profile() function are used to determine the region of code being profiled. Any functions not between first and last will not be profiled. Note that the last function will not be profiled since its starting address is the end of the profiled region. Some C compilers automatically create a function etext(), which is simply the last address used by any program code. You can also determine the last function address by examining the MAP file generated by the linker. If you want to profile your entire program, simply create an empty function after the last function listed in the linker MAP and use it as the argument last. It is perfectly legal to pass NULL and 0xFFFF:0x000F as the values for first and last. The profiler will then profile everything being executed in memory.

More Details.

The value provided in speed is passed to the speedup() function whenever the profiler is enabled. The profiler disables speedup() when it is disabled. In my initial implementation, speedup() was not called by any of the profiler functions. You simply called it before profiling to speed up the system timer, and afterwards to restore the system timer to normal speed. Since these two functions intercept interrupt vector 8, it was possible to hang the system by calling them in incorrect order. By including a call to speedup() in the monitor() function, I was able to control the order and further bullet-proof the code. It also allowed me to include the speedup factor in the profiler output file, so you will see correct times.

Having initialized the profiler, you must enable it by calling monitor(flag). If flag is True, profiling will be enabled. Conversely, profiling will be disabled if flag is False. This feature allows you to profile just a few sections of code without suffering from profiler overhead where it is not needed. For example, I initially used the profiler to speed up the evaluation function of a chess program. I simply placed a monitor(1) statement before the call to the evaluation function, and a monitor(0) after it. Thus I profiled only those functions used to evaluate the chessboard, and ignored the functions that were part of the rather weighty user interface.

Always disable the profiler before exiting your program. Failing to disable it or reset it (by calling profile with a NULL buffer) will cause errors, because the timer interrupt will still be pointing to an ISR in your program. The code and memory for the routine may still work through several simple DOS programs, but eventually your system will crash. If your program crashes while running the profiler, you should reboot it to remove the profiler, otherwise dangerous things could happen! Using Control-Break to exit a program while it is profiling could hang your system. This inadequacy could be fixed by adding code to remove the profiler on any abnormal exit, but this requires intercepting more vectors. I have found little need for this feature as I use the profiler on already debugged code (which should be the normal order of things).

After profiling your program, the data collected must be output to a file for processing by PROF.EXE. This is handled by calling writemon(filename), where filename is the name of the file in which you want to store the data. It saves the position of your program in memory, the total number of clock ticks that occurred while the profiler was active, the addresses of the program region being profiled, the system-timer speedup factor used, and the length and contents of the profile buffer. The writemon() function should be called before resetting the profiler, otherwise the profiler will not know what data should be output.

Notes on Using the Profiler

TESTPROF.C (see Listing Four, page 131) is a simple "Sieve of Eratosthenes" program. It shows the proper use of the profiler runtime functions. Figure 1 shows sample output from PROF.EXE and the effect of different speedup values on profiler output from the same program. Many things can affect the output of the profiler, including disk caching, RAM caching, and various TSR programs. When profiling, disable any disk-caching programs you are using, especially any cache that performs delayed writes. This will give you more accurate profiler output. The profiler itself increases program-execution time because it steals cycles during a timer interrupt to perform its work. Using larger speedup values will increase this overhead, and you will begin to see altered results in the output of the profiler.

Figure 1: Sample profiler output for the TESTPROF.C program.

   Start:  24624
   End:  24952
  Length:   4096
   Scale:      1
   Speed:      1
   Time: 00:10.384
  Time        Routine    % of Total  % of Accounted
  00:02.197   _markbit   21.16%      40.82%
  00:02.087   _markall   20.11%      38.78%
  00:00.659   _findnext   6.35%      12.24%
  00:00.439   _testbit    4.23%       8.16%
  00:05.384   Total   51.85%
   Start:  24624
   End:  24952
  Length:   4096
   Scale:      1
   Speed:      2
   Time: 00:15.741
  Time        Routine    % of Total  % of Accounted
  00:02.060   _markbit   13.09%      41.90%
  00:01.483   _markall    9.42%      30.17%
  00:00.741   _testbit    4.71%      15.08%
  00:00.467   _findnext   2.97%       9.50%
  00:00.164   DOS         1.05%       3.35%
  00:04.917   Total  31.24%
        .         .         .
   Start:  24624
     End:  24952
  Length:   4096
   Scale:      1
   Speed:    512
   Time: 00:39.369
  Time        Routine    % of Total  % of Accounted
  00:01.964   _markbit   4.99%       57.89%
  00:01.161   _testbit   2.95%       34.21%
  00:00.182   _findnext  0.46%       5.38%
  00:00.082   _markall   0.21%       2.42%
  00:00.002   DOS        0.01%       0.09%
  00:03.393   Total  8.62%

For best results, start the profiler with no speedup value (0 or 1) and a buffer at least as long as all of the functions you wish to profile. Continue doubling the speedup value until you get acceptable results. Buffer size also affects the output. If your buffer is significantly smaller than the region of code you are profiling, the profiler may incorrectly assign clock ticks to the function before the one actually being executed. Your buffer should have at least one element for every 16 to 32 bytes of code being profiled. Using larger scales (or, one element per 64 bytes) will give erroneous results in some cases because results from smaller functions will be combined with functions immediately before them in memory. Smaller functions are often heavily used, which could further distort your results.

Reporting

PROF.C in Listing Three(page 128) provides the reporting features of the profiler. The readmap() function imports the MAP file produced by the linker. It searches for the strings by value to locate the section of the MAP file containing function addresses sorted by address. It ignores all absolute symbols found in the MAP. You could replace this function with one that imported symbols from other file formats. The readmap() function should set nmap to the number of map entries used, and set emaina to the address of the main function. emaina is used as a reference address to determine the location of other functions in memory.

My readmap() function adds two entries used to indicate the amount of time spent in other code. The DOS entry simply collects all time spent in any code whose address is before the first address of your program. The BIOS entry collects all of the time spent in code that occurs after your program in memory. If you profile all memory in a plain vanilla MS-DOS system (no HIMEM.SYS driver or other use of upper-memory blocks), these entries will correctly report the amount of time your program spends executing MS-DOS or BIOS routines.

Possible Improvements

A good profiler not only tells you how much time is spent in each function, but also indicates how many times each function is called. This is useful if you want to identify functions that would benefit from fast calling conventions, conversion to inline code (if your compiler supports it), or macros (if not). Counting function calls could be done by modifying the stack check (__CHKSTK) function provided with your compiler. The __CHKSTK function could call a routine similar to _profmon(), passing its return address. This would require a second buffer, or you could split the buffer passed to profile() in two.

You could also add support for different symbol-table formats or additional features. For example, some compilers will output the line numbers and addresses of each line of code. Adding support for line numbers would involve adding a little more code to the reporting program, and the code would be very similar to that already used for counting time spent in a function. Note that reporting on line numbers would provide limited data unless the system timer was sped up by an appreciable factor. At that point, it might be necessary to code _profmon() directly in assembly and to limit scaling to powers of two in order to limit the amount of time spent in the ISR.

Hardware Notes

The IBM PC and compatible systems use three programmable timers for system timing, memory refresh, and speaker control. Originally, the timers were all part of one chip on the system board--the Intel 8254 programmable interval timer. Today, a VLSI chip usually handles these functions and much more, but the timing functions work the same as with the 8254. The input of Timer 0 is connected to a 1.19318-MHz signal which decrements the timer by 1 each time the signal goes high. The output of Timer 0 is connected to the interrupt 0 input of the first 8259 interrupt controller, which is used to control and prioritize interrupts occurring in the system. The 8259 essentially forces the processor to execute an INT 8 instruction whenever the counter element of Timer 0 reaches 0.

Each timer has a control register, status register, counter register, counter element, and output latch. The counter register contains the starting value loaded into the counter element after its reaches 0. The output latch is used to save the contents of the counter element to be read by the CPU. Your system BIOS loads Timer 0's counter register with a value of 65,536 shortly after your system is booted. This generates approximately 18.2 interrupts per second (1.19318 MHz/65536=18.2065 Hz). You can speed up the system timer by decreasing the value contained in its counter register.

Listing Two contains a function that speeds up the system timer in this manner. Note that it calls the old timer interrupt only when its internal counter is 0. Otherwise, it sends an end-of-interrupt command to the 8259 controller. The end-of-interrupt command is necessary to allow the 8259 to continue processing interrupt requests.

--KB



_YOUR OWN SOURCE CODE PROFILER_
by Keith W. Boone


[LISTING ONE]
<a name="0236_000d">

/* PROFILER.C -- Copyright (c) 1992 by Keith W. Boone, Tallahassee, FL
 * This program contains the runtime functions required to profile C source
 * code. Permission is granted to use or distribute this source code without
 * any restrictions as long as this entire notice is included intact. You may
 * use or include this software in any product that you sell for profit.
 * This software is distributed as is, and is no liability is granted.
 */

#include    <stdio.h>
#include    <dos.h>

static long     start,      /*  Linear code start address      */
                end,        /*  Linear code end address        */
                reference,  /*  Main function address          */
                ticker;     /*  Global total time in profiler  */
static int      *buffer,    /*  Pointer to CS:IP tick buffer   */
                length,     /*  Length of the buffer           */
                scale,      /*  Scaling factor for the buffer  */
                status,     /*  Status = TRUE if profiling     */
                (interrupt far *old8)(); /* Old clock interrupt */
unsigned        speedup_factor = 1; /*  Clock speedup factor    */

/***** Convert a far pointer to a linear address *****/
long linear(void far *ptr)
{
    long    addr = FP_SEG(ptr);
    if ((long)ptr == -1l)
        return 0x100000;
    addr <<= 4;
    addr += FP_OFF(ptr);
    return addr;
}
/*****  _profmon is a clock interrupt handler which is used to  mark what
function your program is in during the interrupt. *****/
interrupt far _profmon(int es, int ds, int di, int si,
    int bp, int sp, int bx, int dx, int cx, int ax,
    unsigned ip, unsigned cs
)
{
    if (status)         /* If profiling */
    {   long    addr = cs;
        addr <<= 4;
        addr += ip;     /* Compute Phys address */
        ticker++;       /* Increase total time */
        if ((addr >= start) && (addr <= end)) /* If in range of code */
        {   addr -= start;
            buffer[addr / scale]++; /* Update buffer */
        }
    }
    _chain_intr(old8);
}
/***** profile(buf, len, first, last, speed) initializes the profiler. Buf is
 Buf is a pointer to an array of integers that keeps track of CS:IP locations
 while profiling.  First and last are pointers to the functions where profiling
 will occur. Note that the function last is never profiled! Speed is the clock
 frequency multiplier passed to speedup() when turning profiling on. *****/
profile(int *buf, int len, int (*first)(), int (*last)(), unsigned speed)
{
    unsigned    i, j;
    long        size;
    int     main();
    buffer = buf;
    length = len;
    start = linear((void far *)first);    /* Conver to linear address */
    end = linear((void far *)last);
    reference = linear((void far *)main); /* main() is relocation ref */
    status = 0;                           /* Profiling off initially */
    speedup_factor = speed;               /* Used with speedup() */

    if (buf == NULL)                      /* buf==NULL is reset */
    {   if (old8 != NULL)
            _dos_setvect(0x08, old8);
        return 0;
    }
    size = (end - start);                /* Compute size and scale */
    scale = ((size - 1) / length) + 1;
    return 0;
}
/***** monitor(flag) turns profiling on or off.  If turning profiling on then
 the program needs to trap the clock interrupt, then set status = TRUE.
 Otherwise the program resets the clock interrupt, then sets status = FALSE.
 This ensures that status does not change to FALSE while in _profmon(),
 not really a big deal, but cleaner. ******/
monitor(int flag)
{
    if (buffer == NULL)          /* Turn on w/out initialization is error */
        return -1;
    if (flag)                    /* If turning profiling on */
    {   if (speedup_factor > 1)  /* Speed up clock by saved */
            speedup(speedup_factor);  /* speedup factor */
        old8 = _dos_getvect(0x08);    /* get old clock handler */
        _dos_setvect(0x08, _profmon); /* set new vector */
    }
    else    if (old8 != NULL)         /* Else turning off */
    {   _dos_setvect(0x08, old8);     /* reset old vector */
        old8 = NULL;
        if (speedup_factor > 1)       /* Restore clock speed */
            speedup(1);               /* Note reverse order! */
    }
    status = flag;                    /* Set on/off flag */
    return 0;
}
/***** writemon(filename) copies the profiling data to the file specified
 for later use by PROF.EXE. *****/
writemon(char *filename)
{
    FILE    *fp = fopen(filename, "wb");
    if (fp == NULL)
        return -1;
    monitor(0);
    fwrite(&reference, 1, sizeof(reference), fp);
    fwrite(&ticker, 1, sizeof(ticker), fp);
    fwrite(&start, 1, sizeof(start), fp);
    fwrite(&end, 1, sizeof(start), fp);
    fwrite(&length, 1, sizeof(length), fp);
    fwrite(&scale, 1, sizeof(scale), fp);
    fwrite(&speedup_factor, 1, sizeof(speedup_factor), fp);
    fwrite(buffer, length, sizeof(*buffer), fp);
    fclose(fp);
}





<a name="0236_000e">
<a name="0236_000f">
[LISTING TWO]
<a name="0236_000f">

#include    <dos.h>

static int  (interrupt far *oldclk)();  /* Old clock interrupt  */
static unsigned counter,                /* Current counter value */
                factor;
#define PICCMD  0x20                    /* 8259 PIC command port */
#define EOI     0x20                    /* EOI command */

#define TCTRL   0x43                    /* Timer control port */
#define T0DATA  0x40                    /* Timer 0 data port */
#define T0LOAD  0x36                    /* Timer 0 load command */

/****** _spdup is an interrupt handler which processes clock interrupts which
 have been sped up. *****/
interrupt far _spdup()
{
    if (counter--)                      /* If counter non-zero */
        outp(PICCMD, EOI);              /* Send EOI and return */
    else
    {   counter = factor;               /* Otherwise reset counter */
        _chain_intr(oldclk);            /* and exec old handler */
    }
}
speedup(unsigned newfact)
{
    unsigned divider;
    if (newfact <= 1)                   /* Reset timer handler */
    {   factor = 1;
        if (oldclk)                     /* reset only if set */
        {   _dos_setvect(0x08, oldclk); /* in the first place */
            outp(TCTRL, T0LOAD);
            outp(T0DATA, 0);
            outp(T0DATA, 0);            /* standard divisor */
            oldclk = 0;
        }
    }
    else
    {   if (!oldclk)
            oldclk = _dos_getvect(0x08); /* Save old handler */
        counter = factor = newfact;
        divider = 65536L / newfact;
        _disable();                     /* Disable interrupts */
        outp(TCTRL, T0LOAD);
        outp(T0DATA, divider);
        outp(T0DATA, (divider >> 8));   /* load new divisor */
        _enable();                      /* Enable interrupts */
        _dos_setvect(0x08, _spdup);     /* use our handler */
    }
}





<a name="0236_0010">
<a name="0236_0011">
[LISTING THREE]
<a name="0236_0011">

/*  PROF.C -- Copyright (c) 1992 by Keith W. Boone, Tallahassee, FL
 * This program contains the reporting facilities used to profile C code.
 * Permission is granted to use or distribute this source code without any
 * restrictions as long as this entire notice is included intact. You may use
 * or include this software in any product that you sell for profit.
 * This software is distributed as is, and is no liability is granted.
 */

#include    <stdio.h>
long start,     /* linear address of Start of profiled code    */
    end,        /* End of profiled code                        */
    reference,  /* Address of main function to resolve reloc   */
    ticker;     /* Value of ticker at end of profiling         */
int *buffer,    /* Pointer to profile buffer                   */
    length,     /* Length of profile buffer                    */
    scale,      /* Scale used (code address to buffer address) */
    factor;     /* Clock speedup factor                        */

typedef struct mapentry  /* Used to store MAP file information */
{   long    address, /* Linear address of function             */
        count;       /* computed # of ticks in function        */
    char    *name;   /* Function Name                          */
}   MAP;

MAP map[4096];          /*  Limited to 4096 functions           */
int nmap = 0;           /*  Number of Map table entries used    */
long    emaina;         /*  Linear address of main function     */

/***** Sortmap(m1, m2) is a callback function used by qsort to sort
 map table entried by linear address. *****/
sortmap(MAP *m1, MAP *m2)
{   if (m1->address < m2->address)
        return -1;
    if (m1->address > m2->address)
        return 1;
    return 0;
}
/***** Readmap(filename) reads in the MAP stored in 'filename' and loads the
data into map[].  *****/
readmap(char *filename)
{
    char    line[80], *ptr, *strrchr(), *strdup();
    long    off, seg;
    FILE    *fp = fopen(filename, "rt");

    if (fp == NULL) /* Return error code if cannot open MAP file */
        return -1;
    map[nmap++].name = "DOS"; /* Address below first are in DOS */

    /* Read each line of the map file until you find "Publics by Value".
     * This is VERY linker dependent! */
    while (fgets(line, 80, fp))
    {   if (ptr = strrchr(line, 'b'))
        {   if (strcmp(ptr, "by Value\n") == 0)
                break;
            else ptr = NULL;
        }
    }
    if (ptr == NULL)    /* If we didn't find "Publics by Value" */
    {   fclose(fp);     /* the return with an failure code. */
        return -1;
    }
    while (fgets(line, 80, fp))
    {   /* remove trailing linefeed from line */
        if (ptr = strrchr(line, '\n'))
            *ptr = 0;
        if (!*line)
            continue;                         /* Ignore empty lines  */
        else if (strncmp(line + 12, "Abs", 3) == 0)
            continue;                         /* And Absolue values */
        sscanf(line, " %lX:%lX", &seg, &off); /* Get the seg:off */
        map[nmap].address = (seg << 4) + off; /* convert to linear */
        ptr = strrchr(line, ' ');             /* find function name */
        map[nmap].name = strdup(++ptr);       /* make a copy of it */

        if (strcmp(ptr, "_main") == 0)        /* Found main() */
            emaina = map[nmap].address;       /* save it's address */
        nmap++;
    }
    qsort(map + 1, nmap - 1,  sizeof(MAP), sortmap); /* Sort by address */
    map[nmap++].name = "BIOS";     /* Addresses beyond last are in BOIS */
    return 0;
}
/***** Readmon(filename) reads in the profile data stored in 'filename' and
 and outputs vitals on the profiled data. *****/
readmon(char *filename)
{
    FILE    *fp = fopen(filename, "rb");
    void    *calloc();
    int m, s, k;
    float   T;

    if (fp == NULL)
        return -1;
    fread(&reference, 1, sizeof(reference), fp);
    fread(&ticker, 1, sizeof(ticker), fp);
    fread(&start, 1, sizeof(start), fp);
    fread(&end, 1, sizeof(start), fp);
    fread(&length, 1, sizeof(length), fp);
    fread(&scale, 1, sizeof(scale), fp);
    fread(&factor, 1, sizeof(factor), fp);
    buffer = (int *)calloc(sizeof(int), length);
    fread(buffer, length, sizeof(int), fp);
    fclose(fp);

    T = ticker / (18.2 * factor);
    m = T / 60;
    T -= m * 60;
    s = T;
    T -= s;
    k = T * 1000;
    printf(" Start: %6ld\n", start);
    printf("   End: %6ld\n", end);
    printf("Length: %6d\n", length);
    printf(" Scale: %6d\n", scale);
    printf(" Speed: %6d\n", factor);
    printf("  Time: %02d:%02d.%03d\n", m, s, k);
    return 0;
}
/***** mapcmp(m1, m2) is a callback function used by qsort to sort map table
 entried by amount of time used by the function. *****/
mapcmp(MAP *m1, MAP *m2)
{   if (m1->count > m2->count)
        return -1;
    else if (m1->count == m2->count)
        return 0;
    else return 1;
}
main(int argc, char **argv)
{
    int i,                  /* profile buffer index */
        j;                  /* MAP entry index */
    long    addr;           /* Current linear address */
    int m,                  /* Minutes */
        s,                  /* Seconds */
        t;                  /* thousandths */
    char    buf[32],        /* File name buffer */
        *strchr();
    float   seconds,        /* time in a function in seconds */
        accounted,          /* accounted for in all profiled functions */
        percent;            /* Percent of total accounted */
    if (argc < 2)
    {  fprintf(stderr, "Usage: %s program[.map] [program[.mon]]\n", *argv);
        exit(1);
    }
    /* Append .map to files that have no extension specified for arg 1 */
    if (strchr(argv[1], '.'))
        strcpy(buf, argv[1]);
    else sprintf(buf, "%s.map", argv[1]);
    if (readmap(buf))
    {   fprintf(stderr, "Error reading map '%s'.\n", buf);
        exit(1);
    }
    /* If second arg specified, then use it, otherwise use first arg */
    if (argc > 2)
        i = 2;
    else
    {   i = 1;
        if (strchr(argv[1], '.')) /* remove ext from first arg  */
            *strchr(argv[1], '.') = 0;
    }
    /* Append .mon to files that have no extension specified for arg 2 */
    if (strchr(argv[i], '.'))
        strcpy(buf, argv[i]);
    else    sprintf(buf, "%s.mon", argv[i]);
    if (readmon(buf))
    {   fprintf(stderr, "Error reading '%s'.\n", buf);
        exit(1);
    }
    start -= reference; /* Offset start by the &main */
    end -= reference;
    for (i = 1; i < (nmap - 1); i++) /* adjust map by &main */
        map[i].address -= emaina;
    map[0].address = 0x8000000;  /* DOS start = max neg */
    map[nmap].address = 0x7FFFFFF;   /* BIOS end = max pos */

    /***** For each entry in the buffer, find the first map address (j)
     whose function encompasses addr, and update the time spent in
     that function. *****/
    accounted = 0.0;
    addr = start;
    j = 0;
    for (i = 0; (i < length) && (j < nmap); i++)
    {   if (buffer[i])
        {   while ((j < nmap) && (addr > map[j+1].address))
                j++;
            map[j].count += buffer[i];
            accounted += buffer[i];
        }
        addr += scale;
    }
    qsort(map, nmap, sizeof(MAP), mapcmp);  /* Sort by time used */
    /* For each function, output time used, and accumulate the total */
    printf("Time\t\tRoutine\t\t    %% of Total\t    %% of Accounted\n");
    for (i = 0; i < nmap; i++)
    {   if (map[i].count)
        {   seconds = map[i].count;
            seconds /= (18.2 * factor);
            m = seconds / 60;
            seconds -= m * 60;
            s = seconds;
            seconds -= s;
            t = seconds * 1000;
            printf("%02d:%02d.%03d\t%-20s\t%5.2f%%\t%5.2f%%\n",
                m, s, t, map[i].name,
                map[i].count * 100.0 / ticker,
                map[i].count * 100.0 / accounted);
        }
    }
    /* Output accounted/total times */
    percent = (100.0 * accounted) / ticker;
    accounted /= 18.2 * factor;        /* Convert accounted to seconds */
    m = accounted / 60;
    accounted -= m * 60;
    s = accounted;
    accounted -= s;
    t = accounted * 1000;
    printf("%02d:%02d.%03d\t%-20s\t%5.2f%%\n", m, s, t, "Total", percent);
}





<a name="0236_0012">
<a name="0236_0013">
[LISTING FOUR]
<a name="0236_0013">

#include    <stdio.h>
#define PLEN    4096
unsigned    pbuf[PLEN];

#define ALEN    8000
#define MAX (ALEN*8L)
char    array[ALEN];

markbit(char *array, long bit)
{
    array[bit/8] |= 1 << (bit % 8l);
}
testbit(char *array, long bit)
{
    return array[bit/8] & (1 << (bit % 8l));
}
markall(char *array, long bit)
{
    long    temp;
    for (temp = bit + bit; temp < MAX; temp += bit)
        markbit(array, temp);
}
long findnext(char *array, long bit)
{
    long    temp;
    for (temp = bit + 1; temp < MAX; temp++)
        if (!testbit(array, temp))
            return temp;
    return 0;
}
extern last();
main()
{
    long    value,
        start,
        clock();
    int speed;
    char    buff[16];
    printf("Speedup\tTime\n");
    for (speed = 1; speed <= 2048; speed += speed)
    {   /* reset program workspace on each iteration */
        memset(pbuf, 0, sizeof(pbuf));
        memset(array, 0, sizeof(array));
        /* Set profile buffer, program addresses and speedup */
        /* note that main() is not profiled */
        profile(pbuf, PLEN, markbit, main, speed);
        start = clock();
        printf("%5d\t", speed);
        /* Enable profiling */
        monitor(1);
        /* Execute sieve of Eratosthenese */
        for (value = 2; value; value = findnext(array, value))
            markall(array, value);
        /* Disable profiling */
        monitor(0);
        /* Output profile data for this pass */
        printf("%ld\n", clock() - start);
        sprintf(buff, "tst%05d.mon", speed);
        writemon(buff);
    }
}



Figure 1: Sample profiler output for the TESTPROF.C program

 Start:  24624
 End:  24952
Length:   4096
 Scale:      1
 Speed:      1
 Time: 00:10.384
Time       Routine    % of Total  % of Accounted
00:02.197   _markbit   21.16%      40.82%
00:02.087   _markall   20.11%      38.78%
00:00.659   _findnext   6.35%      12.24%
00:00.439   _testbit    4.23%       8.16%
00:05.384   Total   51.85%
 Start:  24624
 End:  24952
Length:   4096
 Scale:      1
 Speed:      2
 Time: 00:15.741
Time       Routine    % of Total  % of Accounted
00:02.060   _markbit   13.09%      41.90%
00:01.483   _markall    9.42%      30.17%
00:00.741   _testbit    4.71%      15.08%
00:00.467   _findnext   2.97%       9.50%
00:00.164   DOS         1.05%       3.35%
00:04.917   Total  31.24%
 Start:  24624
   End:  24952
Length:   4096
 Scale:      1
 Speed:      4
 Time: 00:18.434
Time        Routine    % of Total  % of Accounted
00:01.895   _markbit   10.28%      45.10%
00:01.208   _markall    6.56%      28.76%
00:00.659   _testbit    3.58%      15.69%
00:00.398   _findnext   2.16%       9.48%
00:00.041   DOS         0.22%       0.98%
00:04.203   Total   22.80%
 Start:  24624
   End:  24952
Length:   4096
 Scale:      1
 Speed:      8
 Time: 00:19.848
Time       Routine    % of Total  % of Accounted
00:01.737   _markall   8.75%       36.19%
00:01.675   _markbit   8.44%       34.91%
00:00.748   _testbit   3.77%       15.59%
00:00.521   _findnext  2.63%       10.87%
00:00.116   DOS        0.59%        2.43%
00:04.800   Total   24.19%
 Start:  24624
   End:  24952
Length:   4096
 Scale:      1
 Speed:     16
 Time: 00:20.669
Time        Routine    % of Total   % of Accounted
00:02.026   _markbit   9.80%        44.66%
00:01.146   _markall   5.55%        25.28%
00:00.831   _testbit   4.02%        18.32%
00:00.484   _findnext  2.34%        10.67%
00:00.048   DOS        0.23%         1.06%
00:04.536   Total   21.95%
 Start:  24624
   End:  24952
Length:   4096
 Scale:      1
 Speed:     32
 Time: 00:21.299
Time        Routine    % of Total  % of Accounted
00:02.558   _markbit   12.01%      49.03%
00:01.237   _markall    5.81%      23.72%
00:00.781   _testbit    3.67%      14.97%
00:00.506   _findnext   2.38%       9.71%
00:00.133   DOS         0.63%       2.57%
00:05.218   Total    24.50%
 Start:  24624
   End:  24952
Length:   4096
 Scale:      1
 Speed:     64
 Time: 00:22.092
Time        Routine    % of Total  % of Accounted
00:02.423   _markbit   10.97%      45.22%
00:01.655   _markall    7.49%      30.88%
00:00.728   _testbit    3.30%      13.58%
00:00.510   _findnext   2.31%       9.53%
00:00.042   DOS         0.19%       0.78%
00:05.359   Total   24.26%
 Start:  24624
   End:  24952
Length:   4096
 Scale:      1
 Speed:    128
 Time: 00:23.507
Time        Routine    % of Total  % of Accounted
00:01.892   _markall   8.05%       37.09%
00:01.677   _markbit   7.14%       32.89%
00:01.008   _testbit   4.29%       19.77%
00:00.518   _findnext  2.21%       10.16%
00:00.004   DOS        0.02%        0.08%
00:05.101   Total   21.70%
 Start:  24624
   End:  24952
Length:   4096
 Scale:      1
 Speed:    256
 Time: 00:27.389
Time        Routine    % of Total  % of Accounted
00:04.228   _markall   15.44%      46.14%
00:02.544   _markbit    9.29%      27.77%
00:01.134   _findnext   4.14%      12.38%
00:00.973   _testbit    3.55%      10.62%
00:00.283   DOS         1.04%       3.09%
00:09.164   Total    33.46%
 Start:  24624
   End:  24952
Length:   4096
 Scale:      1
 Speed:    512
 Time: 00:39.369
Time        Routine    % of Total  % of Accounted
00:01.964   _markbit   4.99%       57.89%
00:01.161   _testbit   2.95%       34.21%
00:00.182   _findnext  0.46%       5.38%
00:00.082   _markall   0.21%       2.42%
00:00.002   DOS        0.01%       0.09%
00:03.393   Total   8.62%












Copyright © 1992, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.