Building Library Interposers for Fun and Profit

Library interposition is a useful technique for tuning performance, collecting runtime statistics, or debugging applications. This article offers helpful tips and tools for working with the technique and gets you started on your own interposer.


November 01, 2001
URL:http://www.drdobbs.com/building-library-interposers-for-fun-and/184404926

Building Library Interposers for Fun and Profit

Most of today's applications use shared libraries and dynamic linking, especially for such system libraries as the standard C library (libc), or the X Window or OpenGL libraries. Operating system vendors encourage this method because it provides many advantages. With dynamic linking, you can intercept any function call that an application makes to any shared library. Once you intercept it, you can do whatever you want in that function, as well as call the real function that the application originally intended to call. Performance tuning is one use of this technology. Even if you have access to profilers and other development tools, or the application's source code itself, having your own library interposer

puts you completely in control. You can see exactly what you're doing and make adjustments at any time.

Building and Running Your First Interposer

To use library interposition, you need to create a special shared library and set the LD_PRELOAD environment variable. When LD_PRELOAD is set, the dynamic linker will use the specified library before any other when it searches for shared libraries. Let's create a simple interposer for malloc(), which is normally a part of /usr/lib/libc.so.1, the standard C library. A message, displaying the argument passed to each malloc() call, will be printed out each time the application calls malloc(). Listing One (malloc_interposer.c) is the source for this interposer. In Listing One, func is a function pointer to the real malloc() routine, which is in /usr/lib/libc.so.1. The RTLD_NEXT argument passed to dlsym(3X) tells the dynamic linker to find the next reference to the specified function, using the normal dynamic linker search sequence. Now let's build and run this interposer, using ls(1) as our sample application. We'll use the C-shell syntax for this and other examples.

  % cc -o malloc_interposer.so -G -Kpic malloc_interposer.c
  % setenv LD_PRELOAD $cwd/malloc_interposer.so
  % ls -l malloc_interposer.so
  malloc(64) is called
  malloc(52) is called
  malloc(1072) is called
  -rwxr-xr-x 1 gregn 5224 	Aug 3 15:21 
  malloc_interposer.so*
  % unsetenv LD_PRELOAD

Without access to the source code of ls(1), and without rebuilding it in any way, we've just discovered which arguments the application used to call malloc() in the test run. Note that LD_PRELOAD must specify the full path to the interposer library, and that library interposition is disabled for setuid programs in order to prevent security problems.

Collecting Runtime Statistics

Here's a more practical example of library interposition on malloc(), as well as on a few other routines. It collects statistics about the size of the memory blocks requested with the calls to malloc(), calloc(), and realloc(), and prints out a histogram detailing their use upon exiting the application. Listing Two (malloc_hist.c) is the source code. Note that we round up all memory-request sizes to the next power of two. To obtain the name of the current executable, we use the proc(4) interface. The version of the proc(4) interface used here works with Solaris 2.6 and above. We can run this interposer with the CDE editor dtpad as the test application.

% setenv LD_PRELOAD $cwd/malloc_hist.so
% dtpad malloc_hist.c
% unsetenv LD_PRELOAD

	Here are the results.

% cat /tmp/malloc_histogram.dtpad.15203
prog_name=dtpad.15203
******** malloc **********
         1              76
         2             105
         4             450
         8             667
        16            2047
        32             620
        64             158
       128              39
       256              33
       512              22
      1024              32
      2048              10
      4096              14
      8192              46
     32768               2
    131072               3
******** calloc **********
         1               0
         2               0
         4            1676
         8              40
        16              21
        32              12
        64              34
       128               4
       256               2
       512               0
      1024               3
      8192               7
******** realloc **********
         1               0
         2               0
         4               2
         8               2
        16              11
        32              11
        64              14
       128               1
       256               0
       512               0
If the application invokes more than one executable, you'll get a histogram in the /tmp directory for each. Such histograms can be quite useful in application performance tuning. We now know that dtpad (as used in this session) calls malloc() to request 16-byte memory blocks more often than it requests other sizes. This tool has been used with many real applications. It revealed that most of one application's malloc() requests were for blocks of four bytes or less. There are two performance problems with this pattern. First of all, most malloc() implementations, including the default Solaris malloc(3C), will waste a lot of memory when used this way. malloc(3C) uses eight bytes of overhead for each memory block it returns to the caller. When the application calls malloc, requesting only four bytes of memory, the malloc() overhead is twice as large as the useful memory consumed. This memory waste can easily lead to increased paging to disk, ruining the application's performance. Second, it's possible to create your own memory allocator specially geared towards small blocks. It can be made a lot faster than the system's default malloc(), which is designed to deal with a wide variety of block sizes.

Fixing a Bug

This is a true story. A major mechanical CAD application stopped working with Solaris 2.6, but continued to work with Solaris 2.5.1. Debugging showed that the reason for failure was a call to XOpenDisplay(3X11) that returned NULL. Interposing on that routine revealed that the application was calling XOpenDisplay() with the argument unix:0.0 instead of with the usual NULL. The reason it didn't work was a bug in X. It could also be considered a bug in the application, because using unix:0.0 for DISPLAY is an old Unix technique which no longer makes sense. In any case, we needed a quick workaround until the bug was fixed. The application in question was old and complex. It called XOpenDisplay() many times from different modules, so even tracking the troublesome calls was a challenge. The following library interposer was our solution. Not only did this interposer print out the argument passed to XOpenDisplay(), it actually changed the XOpenDisplay() argument to NULL, fixing the problem; see Listing Three (XOpenDisplay_interpose.c).

More Ideas

Now that you know how to build and use library interposers, here are some other things you can have them do:

Using the library interposers described in this article, you can monitor your applications' patterns of system-resource consumption and provide useful feedback to application developers.

Acknowledgments

I'd like to thank two of my colleagues at Sun Microsystems: Bart Smaalders, who wrote the original version of the interposer to collect malloc statistics, and Morgan Herrington, who generously helped in many ways.

Resources

"Profiling and Tracing Dynamic Library Usage via Interposition," Timothy W. Curry (USENIX Conference Proceedings, Summer 1994): http://www.usenix.org/publications/library/proceedings/bos94/curry.html.

Listing One
/* Example of a library interposer: interpose on malloc().
* Build and use this interposer as following:
* cc -o malloc_interposer.so -G -Kpic malloc_interposer.c
* setenv LD_PRELOAD $cwd/malloc_interposer.so
* run the app
* unsetenv LD_PRELOAD
*/
  #include <stdio.h>
  #include <dlfcn.h>
void *malloc(size_t size)
  {
  static void * (*func)();
 if(!func)
  func = (void *(*)()) dlsym(RTLD_NEXT, "malloc");
  printf("malloc(%d) is called\n", size); 
  return(func(size));
  }

  Listing Two
  /* Library interposer to collect malloc/calloc/realloc statistics
  * and produce a histogram of their use.
  * cc -o malloc_hist.so -G -Kpic malloc_hist.c
  * setenv LD_PRELOAD $cwd/malloc_hist.so
  * run the application
  * unsetenv LD_PRELOAD
  *
  * The results will be in /tmp/malloc_histogram.<prog_name>.<pid>
  * for each process invoked by current application.
  */
  #include <dlfcn.h>
  #include <memory.h>
  #include <assert.h>
  #include <thread.h>
  #include <stdio.h>
  #include <procfs.h>
  #include <fcntl.h>
typedef struct data {
  int histogram[32];
  char * caller;
  } data_t;
data_t mdata = { 
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 
  "malloc"};
data_t cdata = { 
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 
  "calloc"};
data_t rdata = { 
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 
  "realloc"};
static FILE* output;
  static int pid;
  static char prog_name[32];
  static char path[64];
static void done()
  {
  fprintf(output, "prog_name=%s\n", prog_name);
  print_data(&mdata);
  print_data(&cdata);
  print_data(&rdata);
  }
static int print_data(data_t * ptr)
  {
  int i;
 fprintf(output, "******** %s **********\n", ptr->caller);
  for(i=0;i<32;i++)
  if(i < 10 || ptr->histogram[i])
  fprintf(output, "%10u\t%10d\n", 1<<i, ptr->histogram[i]);
  }
exit(int status)
  {
  char procbuf[32];
  psinfo_t psbuf;
  int fd;
 /* Get current executable's name using proc(4) interface */
  pid = (int)getpid();
  (void)sprintf(procbuf, "/proc/%ld/psinfo", (long)pid);
  if ((fd = open(procbuf, O_RDONLY)) != -1)
  {
  if (read(fd, &psbuf, sizeof(psbuf)) == sizeof(psbuf))
  sprintf(prog_name, "%s.%d", psbuf.pr_fname, pid);
  else
  sprintf(prog_name, "%s.%d", "unknown", pid);
  }
  else
  sprintf(prog_name, "%s.%d", "unknown", pid);
  sprintf(path, "%s%s", "/tmp/malloc_histogram.", prog_name);
 /* Open the file here since
  the shell closes all file descriptors before calling exit() */
  output = fopen(path, "w");
  if(output)
  done();
  (*((void (*)())dlsym(RTLD_NEXT, "exit")))(status);
  }
static int bump_counter(data_t * ptr, int size)
  {
  static mutex_t lock;
  int size_orig;
  int i = 0;
 size_orig = size;
  while(size /= 2)
  i++;
  if(1<<i < size_orig)
  i++;
 /* protect histogram data if application is multithreaded */ 
  mutex_lock(&lock);
  ptr->histogram[i]++;
  mutex_unlock(&lock);
  }

  void * malloc(size_t size)
  {
  static void * (*func)();
  void * ret;
 if(!func) {
  func = (void *(*)()) dlsym(RTLD_NEXT, "malloc");
  }
 ret = func(size);
 bump_counter(&mdata, size);
  
  return(ret);
  }
void * calloc(size_t nelem, size_t elsize)
  {
  static void * (*func)();
  void * ret;
  int i;
 if(!func)
  func = (void *(*)()) dlsym(RTLD_NEXT, "calloc");
 ret = func(nelem, elsize);
 for(i=0;i<nelem;i++)
  bump_counter(&cdata, elsize);
  
  return(ret);
  }
void * realloc(void *ptr, size_t size)
  {
  static void * (*func)();
  void * ret;
 if(!func)
  func = (void *(*)()) dlsym(RTLD_NEXT, "realloc");
 ret = func(ptr, size);
  bump_counter(&rdata, size);
  
  return(ret);
  }

  Listing Three 
  /*
  * Library interposer for XOpenDisplay() and XCloseDisplay.
  * cc -g -o XOpenDisplay_interpose.so -G -Kpic XOpenDisplay_interpose.c
  * setenv LD_PRELOAD $cwd/XOpenDisplay_interpose.so
  * run the app
  * unsetenv LD_PRELOAD
  */
  #include <stdio.h>
  #include <X11/Xlib.h>
  #include <dlfcn.h>
Display *XOpenDisplay(char *display_name)
  {
  static Display * (*func)(char *);
  Display *ret;
  
  if(!func)
  func = (Display *(*)(char *))dlsym(RTLD_NEXT, "XOpenDisplay");
 if(display_name)
  printf("XOpenDisplay() is called with display_name=%s\n", display_name);
  else
  printf("XOpenDisplay() is called with display_name=NULL\n");
  /*
  ret = func(display_name);
  */
  printf(" calling XOpenDisplay(NULL)\n");
  ret = func(0);
  printf("XOpenDisplay() returned %p\n", ret);
 return(ret);
  }
int XCloseDisplay(Display *display_name)
  { 
  static int (*func)(Display *); 
  int ret; 
  
  if(!func)
  func = (int (*)(Display *))dlsym(RTLD_NEXT, "XCloseDisplay"); 
  
  ret = func(display_name); 
  
  printf("called XCloseDisplay(%p)\n", display_name); 
  
  return(ret); 
  }

Building Library Interposers for Fun and Profit

Building Library Interposers for Fun and Profit

by Greg Nakhimovsky

Most of today's applications use shared libraries and dynamic linking, especially for such system libraries as the standard C library (libc), or the X Window or OpenGL libraries. Operating system vendors encourage this method because it provides many advantages. With dynamic linking, you can intercept any function call that an application makes to any shared library. Once you intercept it, you can do whatever you want in that function, as well as call the real function that the application originally intended to call. Performance tuning is one use of this technology. Even if you have access to profilers and other development tools, or the application's source code itself, having your own library interposer

puts you completely in control. You can see exactly what you're doing and make adjustments at any time.

Building and Running Your First Interposer

To use library interposition, you need to create a special shared library and set the LD_PRELOAD environment variable. When LD_PRELOAD is set, the dynamic linker will use the specified library before any other when it searches for shared libraries.

Let's create a simple interposer for malloc(), which is normally a part of /usr/lib/libc.so.1, the standard C library. A message, displaying the argument passed to each malloc() call, will be printed out each time the application calls malloc().

Listing One (malloc_interposer.c) is the source for this interposer. In Listing One, func is a function pointer to the real malloc() routine, which is in /usr/lib/libc.so.1. The RTLD_NEXT argument passed to dlsym(3X) tells the dynamic linker to find the next reference to the specified function, using the normal dynamic linker search sequence.

Now let's build and run this interposer, using ls(1) as our sample application. We'll use the C-shell syntax for this and other examples.

  % cc -o malloc_interposer.so -G -Kpic malloc_interposer.c
  % setenv LD_PRELOAD $cwd/malloc_interposer.so
  % ls -l malloc_interposer.so
  malloc(64) is called
  malloc(52) is called
  malloc(1072) is called
  -rwxr-xr-x 1 gregn 5224 	Aug 3 15:21 
  malloc_interposer.so*
  % unsetenv LD_PRELOAD

Without access to the source code of ls(1), and without rebuilding it in any way, we've just discovered which arguments the application used to call malloc() in the test run. Note that LD_PRELOAD must specify the full path to the interposer library, and that library interposition is disabled for setuid programs in order to prevent security problems.

Collecting Runtime Statistics

Here's a more practical example of library interposition on malloc(), as well as on a few other routines. It collects statistics about the size of the memory blocks requested with the calls to malloc(), calloc(), and realloc(), and prints out a histogram detailing their use upon exiting the application.

Listing Two (malloc_hist.c) is the source code. Note that we round up all memory-request sizes to the next power of two. To obtain the name of the current executable, we use the proc(4) interface. The version of the proc(4) interface used here works with Solaris 2.6 and above. We can run this interposer with the CDE editor dtpad as the test application.


% setenv LD_PRELOAD $cwd/malloc_hist.so
% dtpad malloc_hist.c
% unsetenv LD_PRELOAD

	Here are the results.

% cat /tmp/malloc_histogram.dtpad.15203
prog_name=dtpad.15203
******** malloc **********
         1              76
         2             105
         4             450
         8             667
        16            2047
        32             620
        64             158
       128              39
       256              33
       512              22
      1024              32
      2048              10
      4096              14
      8192              46
     32768               2
    131072               3
******** calloc **********
         1               0
         2               0
         4            1676
         8              40
        16              21
        32              12
        64              34
       128               4
       256               2
       512               0
      1024               3
      8192               7
******** realloc **********
         1               0
         2               0
         4               2
         8               2
        16              11
        32              11
        64              14
       128               1
       256               0
       512               0
If the application invokes more than one executable, you'll get a histogram in the /tmp directory for each. Such histograms can be quite useful in application performance tuning. We now know that dtpad (as used in this session) calls malloc() to request 16-byte memory blocks more often than it requests other sizes. This tool has been used with many real applications. It revealed that most of one application's malloc() requests were for blocks of four bytes or less. There are two performance problems with this pattern.

First of all, most malloc() implementations, including the default Solaris malloc(3C), will waste a lot of memory when used this way. malloc(3C) uses eight bytes of overhead for each memory block it returns to the caller. When the application calls malloc, requesting only four bytes of memory, the malloc() overhead is twice as large as the useful memory consumed. This memory waste can easily lead to increased paging to disk, ruining the application's performance. Second, it's possible to create your own memory allocator specially geared towards small blocks. It can be made a lot faster than the system's default malloc(), which is designed to deal with a wide variety of block sizes.

Fixing a Bug

This is a true story. A major mechanical CAD application stopped working with Solaris 2.6, but continued to work with Solaris 2.5.1. Debugging showed that the reason for failure was a call to XOpenDisplay(3X11) that returned NULL. Interposing on that routine revealed that the application was calling XOpenDisplay() with the argument unix:0.0 instead of with the usual NULL.

The reason it didn't work was a bug in X. It could also be considered a bug in the application, because using unix:0.0 for DISPLAY is an old Unix technique which no longer makes sense. In any case, we needed a quick workaround until the bug was fixed. The application in question was old and complex. It called XOpenDisplay() many times from different modules, so even tracking the troublesome calls was a challenge. The following library interposer was our solution. Not only did this interposer print out the argument passed to XOpenDisplay(), it actually changed the XOpenDisplay() argument to NULL, fixing the problem; see Listing Three (XOpenDisplay_interpose.c).

More Ideas

Now that you know how to build and use library interposers, here are some other things you can have them do:

Using the library interposers described in this article, you can monitor your applications' patterns of system-resource consumption and provide useful feedback to application developers.

Acknowledgments

I'd like to thank two of my colleagues at Sun Microsystems: Bart Smaalders, who wrote the original version of the interposer to collect malloc statistics, and Morgan Herrington, who generously helped in many ways.

Resources

"Profiling and Tracing Dynamic Library Usage via Interposition," Timothy W. Curry (USENIX Conference Proceedings, Summer 1994): http://www.usenix.org/publications/library/proceedings/bos94/curry.html.

Listing One
/* Example of a library interposer: interpose on malloc().
* Build and use this interposer as following:
* cc -o malloc_interposer.so -G -Kpic malloc_interposer.c
* setenv LD_PRELOAD $cwd/malloc_interposer.so
* run the app
* unsetenv LD_PRELOAD
*/
  #include <stdio.h>
  #include <dlfcn.h>
void *malloc(size_t size)
  {
  static void * (*func)();
 if(!func)
  func = (void *(*)()) dlsym(RTLD_NEXT, "malloc");
  printf("malloc(%d) is called\n", size); 
  return(func(size));
  }

  Listing Two
  /* Library interposer to collect malloc/calloc/realloc statistics
  * and produce a histogram of their use.
  * cc -o malloc_hist.so -G -Kpic malloc_hist.c
  * setenv LD_PRELOAD $cwd/malloc_hist.so
  * run the application
  * unsetenv LD_PRELOAD
  *
  * The results will be in /tmp/malloc_histogram.<prog_name>.<pid>
  * for each process invoked by current application.
  */
  #include <dlfcn.h>
  #include <memory.h>
  #include <assert.h>
  #include <thread.h>
  #include <stdio.h>
  #include <procfs.h>
  #include <fcntl.h>
typedef struct data {
  int histogram[32];
  char * caller;
  } data_t;
data_t mdata = { 
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 
  "malloc"};
data_t cdata = { 
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 
  "calloc"};
data_t rdata = { 
  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 
  "realloc"};
static FILE* output;
  static int pid;
  static char prog_name[32];
  static char path[64];
static void done()
  {
  fprintf(output, "prog_name=%s\n", prog_name);
  print_data(&mdata);
  print_data(&cdata);
  print_data(&rdata);
  }
static int print_data(data_t * ptr)
  {
  int i;
 fprintf(output, "******** %s **********\n", ptr->caller);
  for(i=0;i<32;i++)
  if(i < 10 || ptr->histogram[i])
  fprintf(output, "%10u\t%10d\n", 1<<i, ptr->histogram[i]);
  }
exit(int status)
  {
  char procbuf[32];
  psinfo_t psbuf;
  int fd;
 /* Get current executable's name using proc(4) interface */
  pid = (int)getpid();
  (void)sprintf(procbuf, "/proc/%ld/psinfo", (long)pid);
  if ((fd = open(procbuf, O_RDONLY)) != -1)
  {
  if (read(fd, &psbuf, sizeof(psbuf)) == sizeof(psbuf))
  sprintf(prog_name, "%s.%d", psbuf.pr_fname, pid);
  else
  sprintf(prog_name, "%s.%d", "unknown", pid);
  }
  else
  sprintf(prog_name, "%s.%d", "unknown", pid);
  sprintf(path, "%s%s", "/tmp/malloc_histogram.", prog_name);
 /* Open the file here since
  the shell closes all file descriptors before calling exit() */
  output = fopen(path, "w");
  if(output)
  done();
  (*((void (*)())dlsym(RTLD_NEXT, "exit")))(status);
  }
static int bump_counter(data_t * ptr, int size)
  {
  static mutex_t lock;
  int size_orig;
  int i = 0;
 size_orig = size;
  while(size /= 2)
  i++;
  if(1<<i < size_orig)
  i++;
 /* protect histogram data if application is multithreaded */ 
  mutex_lock(&lock);
  ptr->histogram[i]++;
  mutex_unlock(&lock);
  }

  void * malloc(size_t size)
  {
  static void * (*func)();
  void * ret;
 if(!func) {
  func = (void *(*)()) dlsym(RTLD_NEXT, "malloc");
  }
 ret = func(size);
 bump_counter(&mdata, size);
  
  return(ret);
  }
void * calloc(size_t nelem, size_t elsize)
  {
  static void * (*func)();
  void * ret;
  int i;
 if(!func)
  func = (void *(*)()) dlsym(RTLD_NEXT, "calloc");
 ret = func(nelem, elsize);
 for(i=0;i<nelem;i++)
  bump_counter(&cdata, elsize);
  
  return(ret);
  }
void * realloc(void *ptr, size_t size)
  {
  static void * (*func)();
  void * ret;
 if(!func)
  func = (void *(*)()) dlsym(RTLD_NEXT, "realloc");
 ret = func(ptr, size);
  bump_counter(&rdata, size);
  
  return(ret);
  }

  Listing Three 
  /*
  * Library interposer for XOpenDisplay() and XCloseDisplay.
  * cc -g -o XOpenDisplay_interpose.so -G -Kpic XOpenDisplay_interpose.c
  * setenv LD_PRELOAD $cwd/XOpenDisplay_interpose.so
  * run the app
  * unsetenv LD_PRELOAD
  */
  #include <stdio.h>
  #include <X11/Xlib.h>
  #include <dlfcn.h>
Display *XOpenDisplay(char *display_name)
  {
  static Display * (*func)(char *);
  Display *ret;
  
  if(!func)
  func = (Display *(*)(char *))dlsym(RTLD_NEXT, "XOpenDisplay");
 if(display_name)
  printf("XOpenDisplay() is called with display_name=%s\n", display_name);
  else
  printf("XOpenDisplay() is called with display_name=NULL\n");
  /*
  ret = func(display_name);
  */
  printf(" calling XOpenDisplay(NULL)\n");
  ret = func(0);
  printf("XOpenDisplay() returned %p\n", ret);
 return(ret);
  }
int XCloseDisplay(Display *display_name)
  { 
  static int (*func)(Display *); 
  int ret; 
  
  if(!func)
  func = (int (*)(Display *))dlsym(RTLD_NEXT, "XCloseDisplay"); 
  
  ret = func(display_name); 
  
  printf("called XCloseDisplay(%p)\n", display_name); 
  
  return(ret); 
  }

Building Library Interposers for Fun and Profit

Building Library Interposers for Fun and Profit

by Greg Nakhimovsky

Library interposition is a useful technique for tuning performance, collecting runtime statistics, or debugging applications. This article offers helpful tips and tools for working with the technique and gets you started on your own interposer.

Greg is an employee of Sun Microsystems and can be contacted at [email protected].

Most of today's applications use shared libraries and dynamic linking, especially for such system libraries as the standard C library (libc), or the X Window or OpenGL libraries. Operating system vendors encourage this method because it provides many advantages. With dynamic linking, you can intercept any function call that an application makes to any shared library. Once you intercept it, you can do whatever you want in that function, as well as call the real function that the application originally intended to call. Performance tuning is one use of this technology. Even if you have access to profilers and other development tools, or the application's source code itself, having your own library interposer puts you completely in control. You can see exactly what you're doing and make adjustments at any time.

Building and Running Your First Interposer

To use library interposition, you need to create a special shared library and

set the LD_PRELOAD environment variable. When LD_PRELOAD is set, the dynamic

linker will use the specified library before any other when it searches for

shared libraries.

Let's create a simple interposer for malloc(), which is normally

a part of /usr/lib/libc.so.1, the standard C library. A message, displaying

the argument passed to each malloc() call, will be printed out each time

the application calls malloc().

Listing One (malloc_interposer.c) is

the source for this interposer. In Listing One, func is a function pointer to

the real malloc() routine, which is in /usr/lib/libc.so.1. The RTLD_NEXT

argument passed to dlsym(3X) tells the dynamic linker to find the next

reference to the specified function, using the normal dynamic linker search

sequence.

Now let's build and run this interposer, using ls(1) as our sample

application. We'll use the C-shell syntax for this and other examples.


  % cc -o malloc_interposer.so -G -Kpic malloc_interposer.c

  % setenv LD_PRELOAD $cwd/malloc_interposer.so

  % ls -l malloc_interposer.so

  malloc(64) is called

  malloc(52) is called

  malloc(1072) is called

  -rwxr-xr-x 1 gregn 5224 	Aug 3 15:21 

  malloc_interposer.so*

  % unsetenv LD_PRELOAD

Without access to the source code of ls(1), and without rebuilding it

in any way, we've just discovered which arguments the application used to call

malloc() in the test run. Note that LD_PRELOAD must specify the full

path to the interposer library, and that library interposition is disabled for

setuid programs in order to prevent security problems.

Collecting Runtime Statistics

Here's a more practical example of library interposition on malloc(), as

well as on a few other routines. It collects statistics about the size of the

memory blocks requested with the calls to malloc(), calloc(), and

realloc(), and prints out a histogram detailing their use upon exiting

the application.

Listing Two (malloc_hist.c) is the source code. Note that we

round up all memory-request sizes to the next power of two. To obtain the name

of the current executable, we use the proc(4) interface. The version of

the proc(4) interface used here works with Solaris 2.6 and above. We can

run this interposer with the CDE editor dtpad as the test application.




% setenv LD_PRELOAD $cwd/malloc_hist.so

% dtpad malloc_hist.c

% unsetenv LD_PRELOAD



	Here are the results.



% cat /tmp/malloc_histogram.dtpad.15203

prog_name=dtpad.15203

******** malloc **********

         1              76

         2             105

         4             450

         8             667

        16            2047

        32             620

        64             158

       128              39

       256              33

       512              22

      1024              32

      2048              10

      4096              14

      8192              46

     32768               2

    131072               3

******** calloc **********

         1               0

         2               0

         4            1676

         8              40

        16              21

        32              12

        64              34

       128               4

       256               2

       512               0

      1024               3

      8192               7

******** realloc **********

         1               0

         2               0

         4               2

         8               2

        16              11

        32              11

        64              14

       128               1

       256               0

       512               0

If the application invokes more than one executable, you'll get a histogram in

the /tmp directory for each. Such histograms can be quite useful in application

performance tuning. We now know that dtpad (as used in this session) calls malloc()

to request 16-byte memory blocks more often than it requests other sizes. This

tool has been used with many real applications. It revealed that most of one application's

malloc() requests were for blocks of four bytes or less. There are two

performance problems with this pattern.

First of all, most malloc() implementations,

including the default Solaris malloc(3C), will waste a lot of memory when

used this way. malloc(3C) uses eight bytes of overhead for each memory

block it returns to the caller. When the application calls malloc, requesting

only four bytes of memory, the malloc() overhead is twice as large as the

useful memory consumed. This memory waste can easily lead to increased paging

to disk, ruining the application's performance. Second, it's possible to create

your own memory allocator specially geared towards small blocks. It can be made

a lot faster than the system's default malloc(), which is designed to deal

with a wide variety of block sizes.

Fixing a Bug

This is a true story. A major mechanical CAD application stopped working with

Solaris 2.6, but continued to work with Solaris 2.5.1. Debugging showed that the

reason for failure was a call to XOpenDisplay(3X11) that returned NULL.

Interposing on that routine revealed that the application was calling XOpenDisplay()

with the argument unix:0.0 instead of with the usual NULL.

The reason it didn't

work was a bug in X. It could also be considered a bug in the application, because

using unix:0.0 for DISPLAY is an old Unix technique which no longer makes

sense. In any case, we needed a quick workaround until the bug was fixed. The

application in question was old and complex. It called XOpenDisplay() many

times from different modules, so even tracking the troublesome calls was a challenge.

The following library interposer was our solution. Not only did this interposer

print out the argument passed to XOpenDisplay(), it actually changed the

XOpenDisplay() argument to NULL, fixing the problem; see Listing Three

(XOpenDisplay_interpose.c).

More Ideas

Now that you know how to build and use library interposers, here are some other things you can have them do:

Using the library interposers described in this article, you can monitor your applications' patterns of system-resource consumption and provide useful feedback to application developers.

Acknowledgments

I'd like to thank two of my colleagues at Sun Microsystems: Bart Smaalders, who wrote the original version of the interposer to collect malloc statistics, and Morgan Herrington, who generously helped in many ways.

Resources

"Profiling and Tracing Dynamic Library Usage via Interposition," Timothy W.

Curry (USENIX Conference Proceedings, Summer 1994): http://www.usenix.org/publications/library/proceedings/bos94/curry.html.

Listing One

/* Example of a library interposer: interpose on malloc().

* Build and use this interposer as following:

* cc -o malloc_interposer.so -G -Kpic malloc_interposer.c

* setenv LD_PRELOAD $cwd/malloc_interposer.so

* run the app

* unsetenv LD_PRELOAD

*/

  #include <stdio.h>

  #include <dlfcn.h>

void *malloc(size_t size)

  {

  static void * (*func)();

 if(!func)

  func = (void *(*)()) dlsym(RTLD_NEXT, "malloc");

  printf("malloc(%d) is called\n", size); 

  return(func(size));

  }



  Listing Two

  /* Library interposer to collect malloc/calloc/realloc statistics

  * and produce a histogram of their use.

  * cc -o malloc_hist.so -G -Kpic malloc_hist.c

  * setenv LD_PRELOAD $cwd/malloc_hist.so

  * run the application

  * unsetenv LD_PRELOAD

  *

  * The results will be in /tmp/malloc_histogram.<prog_name>.<pid>

  * for each process invoked by current application.

  */

  #include <dlfcn.h>

  #include <memory.h>

  #include <assert.h>

  #include <thread.h>

  #include <stdio.h>

  #include <procfs.h>

  #include <fcntl.h>

typedef struct data {

  int histogram[32];

  char * caller;

  } data_t;

data_t mdata = { 

  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 

  "malloc"};

data_t cdata = { 

  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 

  "calloc"};

data_t rdata = { 

  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 

  "realloc"};

static FILE* output;

  static int pid;

  static char prog_name[32];

  static char path[64];

static void done()

  {

  fprintf(output, "prog_name=%s\n", prog_name);

  print_data(&mdata);

  print_data(&cdata);

  print_data(&rdata);

  }

static int print_data(data_t * ptr)

  {

  int i;

 fprintf(output, "******** %s **********\n", ptr->caller);

  for(i=0;i<32;i++)

  if(i < 10 || ptr->histogram[i])

  fprintf(output, "%10u\t%10d\n", 1<<i, ptr->histogram[i]);

  }

exit(int status)

  {

  char procbuf[32];

  psinfo_t psbuf;

  int fd;

 /* Get current executable's name using proc(4) interface */

  pid = (int)getpid();

  (void)sprintf(procbuf, "/proc/%ld/psinfo", (long)pid);

  if ((fd = open(procbuf, O_RDONLY)) != -1)

  {

  if (read(fd, &psbuf, sizeof(psbuf)) == sizeof(psbuf))

  sprintf(prog_name, "%s.%d", psbuf.pr_fname, pid);

  else

  sprintf(prog_name, "%s.%d", "unknown", pid);

  }

  else

  sprintf(prog_name, "%s.%d", "unknown", pid);

  sprintf(path, "%s%s", "/tmp/malloc_histogram.", prog_name);

 /* Open the file here since

  the shell closes all file descriptors before calling exit() */

  output = fopen(path, "w");

  if(output)

  done();

  (*((void (*)())dlsym(RTLD_NEXT, "exit")))(status);

  }

static int bump_counter(data_t * ptr, int size)

  {

  static mutex_t lock;

  int size_orig;

  int i = 0;

 size_orig = size;

  while(size /= 2)

  i++;

  if(1<<i < size_orig)

  i++;

 /* protect histogram data if application is multithreaded */ 

  mutex_lock(&lock);

  ptr->histogram[i]++;

  mutex_unlock(&lock);

  }



  void * malloc(size_t size)

  {

  static void * (*func)();

  void * ret;

 if(!func) {

  func = (void *(*)()) dlsym(RTLD_NEXT, "malloc");

  }

 ret = func(size);

 bump_counter(&mdata, size);

  

  return(ret);

  }

void * calloc(size_t nelem, size_t elsize)

  {

  static void * (*func)();

  void * ret;

  int i;

 if(!func)

  func = (void *(*)()) dlsym(RTLD_NEXT, "calloc");

 ret = func(nelem, elsize);

 for(i=0;i<nelem;i++)

  bump_counter(&cdata, elsize);

  

  return(ret);

  }

void * realloc(void *ptr, size_t size)

  {

  static void * (*func)();

  void * ret;

 if(!func)

  func = (void *(*)()) dlsym(RTLD_NEXT, "realloc");

 ret = func(ptr, size);

  bump_counter(&rdata, size);

  

  return(ret);

  }



  Listing Three 

  /*

  * Library interposer for XOpenDisplay() and XCloseDisplay.

  * cc -g -o XOpenDisplay_interpose.so -G -Kpic XOpenDisplay_interpose.c

  * setenv LD_PRELOAD $cwd/XOpenDisplay_interpose.so

  * run the app

  * unsetenv LD_PRELOAD

  */

  #include <stdio.h>

  #include <X11/Xlib.h>

  #include <dlfcn.h>

Display *XOpenDisplay(char *display_name)

  {

  static Display * (*func)(char *);

  Display *ret;

  

  if(!func)

  func = (Display *(*)(char *))dlsym(RTLD_NEXT, "XOpenDisplay");

 if(display_name)

  printf("XOpenDisplay() is called with display_name=%s\n", display_name);

  else

  printf("XOpenDisplay() is called with display_name=NULL\n");

  /*

  ret = func(display_name);

  */

  printf(" calling XOpenDisplay(NULL)\n");

  ret = func(0);

  printf("XOpenDisplay() returned %p\n", ret);

 return(ret);

  }

int XCloseDisplay(Display *display_name)

  { 

  static int (*func)(Display *); 

  int ret; 

  

  if(!func)

  func = (int (*)(Display *))dlsym(RTLD_NEXT, "XCloseDisplay"); 

  

  ret = func(display_name); 

  

  printf("called XCloseDisplay(%p)\n", display_name); 

  

  return(ret); 

  }


This article first appeared in Unix Insider by ITworld.

Building Library Interposers for Fun and Profit

Building Library Interposers for Fun and Profit

by Greg Nakhimovsky

Library interposition is a useful technique for tuning performance, collecting runtime statistics, or debugging applications. This article offers helpful tips and tools for working with the technique and gets you started on your own interposer.

Most of today's applications use shared libraries and dynamic linking, especially for such system libraries as the standard C library (libc), or the X Window or OpenGL libraries. Operating system vendors encourage this method because it provides many advantages. With dynamic linking, you can intercept any function call that an application makes to any shared library. Once you intercept it, you can do whatever you want in that function, as well as call the real function that the application originally intended to call. Performance tuning is one use of this technology. Even if you have access to profilers and other development tools, or the application's source code itself, having your own library interposer puts you completely in control. You can see exactly what you're doing and make adjustments at any time.

Building and Running Your First Interposer

To use library interposition, you need to create a special shared library and

set the LD_PRELOAD environment variable. When LD_PRELOAD is set, the dynamic

linker will use the specified library before any other when it searches for

shared libraries.

Let's create a simple interposer for malloc(), which is normally

a part of /usr/lib/libc.so.1, the standard C library. A message, displaying

the argument passed to each malloc() call, will be printed out each time

the application calls malloc().

Listing One (malloc_interposer.c) is

the source for this interposer. In Listing One, func is a function pointer to

the real malloc() routine, which is in /usr/lib/libc.so.1. The RTLD_NEXT

argument passed to dlsym(3X) tells the dynamic linker to find the next

reference to the specified function, using the normal dynamic linker search

sequence.

Now let's build and run this interposer, using ls(1) as our sample

application. We'll use the C-shell syntax for this and other examples.


  % cc -o malloc_interposer.so -G -Kpic malloc_interposer.c

  % setenv LD_PRELOAD $cwd/malloc_interposer.so

  % ls -l malloc_interposer.so

  malloc(64) is called

  malloc(52) is called

  malloc(1072) is called

  -rwxr-xr-x 1 gregn 5224 	Aug 3 15:21 

  malloc_interposer.so*

  % unsetenv LD_PRELOAD

Without access to the source code of ls(1), and without rebuilding it

in any way, we've just discovered which arguments the application used to call

malloc() in the test run. Note that LD_PRELOAD must specify the full

path to the interposer library, and that library interposition is disabled for

setuid programs in order to prevent security problems.

Collecting Runtime Statistics

Here's a more practical example of library interposition on malloc(), as

well as on a few other routines. It collects statistics about the size of the

memory blocks requested with the calls to malloc(), calloc(), and

realloc(), and prints out a histogram detailing their use upon exiting

the application.

Listing Two (malloc_hist.c) is the source code. Note that we

round up all memory-request sizes to the next power of two. To obtain the name

of the current executable, we use the proc(4) interface. The version of

the proc(4) interface used here works with Solaris 2.6 and above. We can

run this interposer with the CDE editor dtpad as the test application.




% setenv LD_PRELOAD $cwd/malloc_hist.so

% dtpad malloc_hist.c

% unsetenv LD_PRELOAD



	Here are the results.



% cat /tmp/malloc_histogram.dtpad.15203

prog_name=dtpad.15203

******** malloc **********

         1              76

         2             105

         4             450

         8             667

        16            2047

        32             620

        64             158

       128              39

       256              33

       512              22

      1024              32

      2048              10

      4096              14

      8192              46

     32768               2

    131072               3

******** calloc **********

         1               0

         2               0

         4            1676

         8              40

        16              21

        32              12

        64              34

       128               4

       256               2

       512               0

      1024               3

      8192               7

******** realloc **********

         1               0

         2               0

         4               2

         8               2

        16              11

        32              11

        64              14

       128               1

       256               0

       512               0

If the application invokes more than one executable, you'll get a histogram in

the /tmp directory for each. Such histograms can be quite useful in application

performance tuning. We now know that dtpad (as used in this session) calls malloc()

to request 16-byte memory blocks more often than it requests other sizes. This

tool has been used with many real applications. It revealed that most of one application's

malloc() requests were for blocks of four bytes or less. There are two

performance problems with this pattern.

First of all, most malloc() implementations,

including the default Solaris malloc(3C), will waste a lot of memory when

used this way. malloc(3C) uses eight bytes of overhead for each memory

block it returns to the caller. When the application calls malloc, requesting

only four bytes of memory, the malloc() overhead is twice as large as the

useful memory consumed. This memory waste can easily lead to increased paging

to disk, ruining the application's performance. Second, it's possible to create

your own memory allocator specially geared towards small blocks. It can be made

a lot faster than the system's default malloc(), which is designed to deal

with a wide variety of block sizes.

Fixing a Bug

This is a true story. A major mechanical CAD application stopped working with

Solaris 2.6, but continued to work with Solaris 2.5.1. Debugging showed that the

reason for failure was a call to XOpenDisplay(3X11) that returned NULL.

Interposing on that routine revealed that the application was calling XOpenDisplay()

with the argument unix:0.0 instead of with the usual NULL.

The reason it didn't

work was a bug in X. It could also be considered a bug in the application, because

using unix:0.0 for DISPLAY is an old Unix technique which no longer makes

sense. In any case, we needed a quick workaround until the bug was fixed. The

application in question was old and complex. It called XOpenDisplay() many

times from different modules, so even tracking the troublesome calls was a challenge.

The following library interposer was our solution. Not only did this interposer

print out the argument passed to XOpenDisplay(), it actually changed the

XOpenDisplay() argument to NULL, fixing the problem; see Listing Three

(XOpenDisplay_interpose.c).

More Ideas

Now that you know how to build and use library interposers, here are some other things you can have them do:

Using the library interposers described in this article, you can monitor your applications' patterns of system-resource consumption and provide useful feedback to application developers.

Acknowledgments

I'd like to thank two of my colleagues at Sun Microsystems: Bart Smaalders, who wrote the original version of the interposer to collect malloc statistics, and Morgan Herrington, who generously helped in many ways.

Resources

"Profiling and Tracing Dynamic Library Usage via Interposition," Timothy W.

Curry (USENIX Conference Proceedings, Summer 1994): http://www.usenix.org/publications/library/proceedings/bos94/curry.html.

Listing One

/* Example of a library interposer: interpose on malloc().

* Build and use this interposer as following:

* cc -o malloc_interposer.so -G -Kpic malloc_interposer.c

* setenv LD_PRELOAD $cwd/malloc_interposer.so

* run the app

* unsetenv LD_PRELOAD

*/

  #include <stdio.h>

  #include <dlfcn.h>

void *malloc(size_t size)

  {

  static void * (*func)();

 if(!func)

  func = (void *(*)()) dlsym(RTLD_NEXT, "malloc");

  printf("malloc(%d) is called\n", size); 

  return(func(size));

  }



  Listing Two

  /* Library interposer to collect malloc/calloc/realloc statistics

  * and produce a histogram of their use.

  * cc -o malloc_hist.so -G -Kpic malloc_hist.c

  * setenv LD_PRELOAD $cwd/malloc_hist.so

  * run the application

  * unsetenv LD_PRELOAD

  *

  * The results will be in /tmp/malloc_histogram.<prog_name>.<pid>

  * for each process invoked by current application.

  */

  #include <dlfcn.h>

  #include <memory.h>

  #include <assert.h>

  #include <thread.h>

  #include <stdio.h>

  #include <procfs.h>

  #include <fcntl.h>

typedef struct data {

  int histogram[32];

  char * caller;

  } data_t;

data_t mdata = { 

  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 

  "malloc"};

data_t cdata = { 

  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 

  "calloc"};

data_t rdata = { 

  0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 

  "realloc"};

static FILE* output;

  static int pid;

  static char prog_name[32];

  static char path[64];

static void done()

  {

  fprintf(output, "prog_name=%s\n", prog_name);

  print_data(&mdata);

  print_data(&cdata);

  print_data(&rdata);

  }

static int print_data(data_t * ptr)

  {

  int i;

 fprintf(output, "******** %s **********\n", ptr->caller);

  for(i=0;i<32;i++)

  if(i < 10 || ptr->histogram[i])

  fprintf(output, "%10u\t%10d\n", 1<<i, ptr->histogram[i]);

  }

exit(int status)

  {

  char procbuf[32];

  psinfo_t psbuf;

  int fd;

 /* Get current executable's name using proc(4) interface */

  pid = (int)getpid();

  (void)sprintf(procbuf, "/proc/%ld/psinfo", (long)pid);

  if ((fd = open(procbuf, O_RDONLY)) != -1)

  {

  if (read(fd, &psbuf, sizeof(psbuf)) == sizeof(psbuf))

  sprintf(prog_name, "%s.%d", psbuf.pr_fname, pid);

  else

  sprintf(prog_name, "%s.%d", "unknown", pid);

  }

  else

  sprintf(prog_name, "%s.%d", "unknown", pid);

  sprintf(path, "%s%s", "/tmp/malloc_histogram.", prog_name);

 /* Open the file here since

  the shell closes all file descriptors before calling exit() */

  output = fopen(path, "w");

  if(output)

  done();

  (*((void (*)())dlsym(RTLD_NEXT, "exit")))(status);

  }

static int bump_counter(data_t * ptr, int size)

  {

  static mutex_t lock;

  int size_orig;

  int i = 0;

 size_orig = size;

  while(size /= 2)

  i++;

  if(1<<i < size_orig)

  i++;

 /* protect histogram data if application is multithreaded */ 

  mutex_lock(&lock);

  ptr->histogram[i]++;

  mutex_unlock(&lock);

  }



  void * malloc(size_t size)

  {

  static void * (*func)();

  void * ret;

 if(!func) {

  func = (void *(*)()) dlsym(RTLD_NEXT, "malloc");

  }

 ret = func(size);

 bump_counter(&mdata, size);

  

  return(ret);

  }

void * calloc(size_t nelem, size_t elsize)

  {

  static void * (*func)();

  void * ret;

  int i;

 if(!func)

  func = (void *(*)()) dlsym(RTLD_NEXT, "calloc");

 ret = func(nelem, elsize);

 for(i=0;i<nelem;i++)

  bump_counter(&cdata, elsize);

  

  return(ret);

  }

void * realloc(void *ptr, size_t size)

  {

  static void * (*func)();

  void * ret;

 if(!func)

  func = (void *(*)()) dlsym(RTLD_NEXT, "realloc");

 ret = func(ptr, size);

  bump_counter(&rdata, size);

  

  return(ret);

  }



  Listing Three 

  /*

  * Library interposer for XOpenDisplay() and XCloseDisplay.

  * cc -g -o XOpenDisplay_interpose.so -G -Kpic XOpenDisplay_interpose.c

  * setenv LD_PRELOAD $cwd/XOpenDisplay_interpose.so

  * run the app

  * unsetenv LD_PRELOAD

  */

  #include <stdio.h>

  #include <X11/Xlib.h>

  #include <dlfcn.h>

Display *XOpenDisplay(char *display_name)

  {

  static Display * (*func)(char *);

  Display *ret;

  

  if(!func)

  func = (Display *(*)(char *))dlsym(RTLD_NEXT, "XOpenDisplay");

 if(display_name)

  printf("XOpenDisplay() is called with display_name=%s\n", display_name);

  else

  printf("XOpenDisplay() is called with display_name=NULL\n");

  /*

  ret = func(display_name);

  */

  printf(" calling XOpenDisplay(NULL)\n");

  ret = func(0);

  printf("XOpenDisplay() returned %p\n", ret);

 return(ret);

  }

int XCloseDisplay(Display *display_name)

  { 
  static int (*func)(Display *); 
  int ret; 


  if(!func)

  func = (int (*)(Display *))dlsym(RTLD_NEXT, "XCloseDisplay"); 

  ret = func(display_name); 

  printf("called XCloseDisplay(%p)\n", display_name); 

  return(ret); 
  }


Greg Nakhimovsky is a member of technical staff at Sun Microsystems. He works with independent software vendors making sure their applications run well on Sun systems. He has 20 years of industry experience developing, performance tuning, and troubleshooting technical computer applications on various computer systems.

This article first appeared in Unix Insider by ITworld.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.