Packing DLLs in your EXE

By Thiadmer Riemersma, June 01, 2002

DLLpack is a tool that allows DLLs to be embedded inside an executable and extracted the first time a function from the DLL is called.

June 2002/ Packing DLLs in your EXE

DLLpack is a function library that allows DLLs to be embedded inside an executable. The embedded DLLs get extracted out of the application the first time a function from the DLL is called. DLLpack exploits the DLL “delay-load” feature of Microsoft Visual C/C++ 6.0 and Borland C++ Builder 5.0. DLLpack solves two of the most severe problems that the industry has with DLLs: installation problems (missing DLLs) and version problems.

Why DLLs?

There are a few advantages of DLLs over static linked libraries. Apart from the minor advantages of reduced disk usage when installing multiple applications that share a single DLL and a reduced overall memory load when such applications run concurrently, an asset of a DLL is that it forms a self-contained package with a standardized interface. Self-contained in the sense that including a DLL into a project infers no more than adding the appropriate import library to the project files; the DLL is immune to the compiler brand and the compiler settings that are used to build the project. In contrast, you cannot expect a static library that you built with one compiler to be useable with another compiler; worse, it might not work with another version of the same compiler. A static library may also be sensitive to compiler and linker settings, for example a mismatch between the multi-tasking and the single-tasking versions of the standard library.

As opposed to that of static libraries, the call level interface of a DLL is pretty well defined. This allows me to create a DLL that is useable not only with multiple C++ compilers, but also with other languages such as Delphi or Visual Basic. Of course, it is quite possible to write a DLL that will only work with one particular system, but that is not at issue here. Creating a DLL that works with most languages/compilers for Microsoft Windows is fairly easy and doing so with static libraries is impossible.

Another (minor) advantage of DLLs over static libraries is that you can load a DLL on an as-needed basis and unload it once you are done with it. This functionality can be useful in situations where a particular routine has significant memory and resource requirements, and only rarely executes.

Why DLLpack?

DLLs have severe problems too: every separate file in a product is another potential installation/configuration problem. Foremost amongst the complaints about DLLs are “missing DLLs” and “version mismatches”. All kinds of files can disappear from a user’s hard disk, but DLLs are apparently more prone to go missing than other files, possibly because they are often stored in a shared location; i.e., the Windows “system” directory. Version problems, where the application needs version X of the DLL, but a later installation of a completely unrelated program has upgraded (or downgraded!) the DLL to version Y, are, in casual parlance, categorized under the label “DLL Hell”.

DLLpack resolves both of these problems. Since the correct DLL (considering the application) is embedded inside the application’s .EXE file, the “missing DLL” problem no longer occurs. Secondly, when the DLL gets extracted from the application, the application has control over exactly _which_ DLL it loads. For implicitly loaded DLLs, Microsoft Windows searches a few directories for a DLL with a matching name and loads the first one that it finds, hence version problems and possibly even name class problems. DLLpack finds a unique name for the DLL at run-time and asks Windows to load that DLL with a fully qualified path. In effect, your DLL is no longer shared with other applications.

DLLpack Internals

Issues that we must solve to implement embedded DLLs are as follows:

attaching the complete DLL file to the application’s executable file;
extracting the DLL from the application’s executable on start-up, or at least before it is needed;
cleaning up the extracted files on termination of the application.

To attach a DLL to an executable, I chose to simply add the DLL to the application as a resource. That is, I mention the DLL in the .RC file of the application. The advantage is twofold: a tool to attach DLLs to a .EXE is already present in the form of the Resource Compiler, and the resource-handling functions provided by Windows are a great help in extracting the DLL.

Cleaning up the temporary files into which the DLLs were extracted at run-time is aided by the operating system: DLLpack opens the file into which it extracted the DLL with the DELETE_ON_CLOSE flag set, and then just never closes the file. When the application terminates, Windows closes all files and, in the process, deletes those flagged with DELETE_ON_CLOSE. If you look at the source code (Listing 1), you may find the sequence of operations a bit odd: the function dpExtractDLL() first opens the file for writing, then closes it, re-opens it immediately with the DELETE_ON_CLOSE flag, and finally calls LoadLibrary(). This is the only sequence that worked on the various versions of Microsoft Windows — Windows 95, Windows 98, Windows NT 4.0, Windows 2000, Windows ME, and Windows XP. Flagging the file DELETE_ON_CLOSE in the first call to CreateFile() causes LoadLibrary() to fail with error code 2 (file not found); moving the second call to CreateFile() below LoadLibrary() causes the CreateFile() function to fail in Windows NT/2000 with error code 5 (access denied). The Windows API documentation implies that one can close the file after the LoadLibrary(): the file will only be deleted after the last handle to the file is closed. Under Windows 95/98, however, Windows appears to “forget” the DELETE_ON_CLOSE flag if you close the file and other handles to the file are still “open”. The FILE_SHARE_
DELETE flag is no help, as it is only available on Windows NT/2000. It is a pity that the hFile parameter of LoadLibraryEx(), which would probably offer a cleaner solution for the task at hand, is still “reserved for future use.”

The remaining issue is to set up delay-loading of the embedded DLLs. Delay-loading of DLLs was introduced with Microsoft Visual C/C++ 6.0. Borland C++ Builder has the same capability starting with version 5. Fortunately, the implementation of delay-loading in C++ Builder is also very compatible with that of Visual C/C++, even though it lacks header files or documentation that allow you to really exploit delay-loading. To use delay-loading, you need to add an option like /DELAYLOAD:_xxx_ for Visual C/C++, and -d_xxx_ for C++ Builder, where _xxx_ stands for the name of the DLL, e.g. “CPUINF32.DLL”. You need to specify this switch for every DLL that is delay-loaded.

The articles listed in “Further Reading” plunge into the details of delay-loading (with surprisingly little overlap). In summary, when delay-loading a DLL, the linker generates a stub function for every function from that DLL that the application calls, plus a redirection table that maps calls to the delay-imported functions to the stub functions. The stub function calls an internal function, __delayLoadHelper() with information on the function to resolve. In a standard operation, __delayLoadHelper() calls LoadLibrary() on the DLL and GetProcAddress() to obtain the function address, but you can customize this behavior by setting “hook functions.” __delayLoadHelper() patches the appropriate entry in the redirection table during its operation to the address that GetProcAddress() returned; every subsequent call to the function bypasses the stub function — and __delayLoadHelper(). __delayLoadHelper() also maintains a per-DLL table to avoid loading a delay-loaded DLL that was already loaded earlier. For details on the patching operation, refer to the article by Matt Pietrek (see “Further Reading”).

In the case of DLLpack, the DLL must first be extracted from the application’s resources before __delayLoadHelper() can load it. As mentioned above, an application can hook into the operation of __delayLoadHelper() and partially overrule it. To set a notification or a failure hook, documentation on MSDN and the articles previously mentioned recommend that you declare a global variable in your application with a predefined name and initialize it, like in the syntax:

PfnDliHook __pfnDliNotifyHook = dpDelayLoadHook;

With C++ Builder, this compiles, but does not work: the __pfnDliNotifyHook variable that you declare is also declared in the run-time library. The linker must now choose which of the two declarations to link in. Apparently, the Borland linker includes both variables in this case and __delayLoadHelper() uses the __pfnDliNotifyHook variable from the run-time library. A quick and easy fix is to _use_ the __pfnDliNotifyHook variable from the run-time library, rather than creating a second variable and rely on the linker to choose the correct one. That is, at the start of main() or WinMain(), just set:

__pfnDliNotifyHook = dpDelayLoadHook;

This works in Visual C/C++ too, by the way. It is important, though, to set the hook function before any delay-imported function is called. This is easy to achieve in C, but it requires attention in C++ in the presence of static class instantiations, whose constructors execute before main() is called.

A final remark on delay-load hook set-up is that Borland C++ Builder implements the __delayLoadHelper() function both in its static run-time library and in its “dynamically loaded” run-time library (CC3250.DLL). When using the dynamic run-time library, the __pfnDliNotifyHook variable must be declared as “__declspec(dllimport)”, as well as “extern.”

There are two hooks, one for notifications and one for failures. DLLpack needs at least to set the notification hook and it must watch the dliNotePreLoadLibrary notification. Upon reception, the hook function analyses the DelayLoadInfo structure that it receives as a parameter and extracts the required DLL into a file with a unique name in the “temporary” directory. Then, it does a LoadLibrary() on the DLL just extracted and indicates to __delayLoadHelper() to skip that step. All of this is implemented in Listing 1.

Extracting into the temporary directory guarantees that we extract into a directory with write permission. Making up a unique filename ensures us that we do not overwrite an existing file. What can go wrong, still, is the eventuality of insufficient disk space when extracting. There are two ways to handle such a situation: use the “failure notification” hook of __delayLoadHelper() to try to correct it, or use structured exception handling to break out of the function call and proceed in some other way.

The failure hook option has the advantage that only needs to be set up once, and it will catch failures for all DLLs. To set the failure hook, set the predefined variable __pfnDliFailureHook to the failure hook function. The failure hook function has the same prototype as the notification hook function, so you can set both __pfnDliNotifyHook and __pfnDliFailureHook to the same function. On receiving the dliFailLoadLib event, the failure hook function may try to fix the situation — for example, by freeing up disk space and trying to extract the DLL again. If unable to repair the problem, the hook function can do little else than either abort the program with an apologetic message, or fall through and cause an exception to be fired.

SEH (Standard exception handling) can also try to reload the DLL, and in addition, it can recover from a persistent failure and skip the faulting function call. The drawback of exception handling is that in the application, every call to a function in a delay-loaded DLL must be folded inside a SEH frame. The __delayLoadHelper() functions of both Visual C/C++ and C++ Builder generate basically the same exception code, but with different “facility codes”. (A facility code is a bit field in the HRESULT type; the exception codes for delay loading follow the format of HRESULT codes.) Visual C/C++ uses the facility code 109 and C++ Builder uses code 251. This minor difference is easily hidden in a macro, see Listing 3. When trapping the exception, GetExceptionInformation() gives you the same DelayLoadInfo structure that a hook function receives, see Listing 2 for an example.

There is a caveat, though: if the exception handler loads the DLL, after freeing up disk space, and continues execution, __delayLoadHelper() does not save the fact that the DLL is now loaded. The notification function will receive a second dliNotePreLoadLibrary notification for the next function that you call from the same DLL, causing that DLL to be extracted twice. As such, I suggest that you use exception handling only to jump out of a function call, as in Listing 2. Personally, I simply terminate the program inside the failure hook if a DLL fails to extract; this is what is implemented in the dpDelayLoadHook() function in Listing 1. On the other hand, I try to extract DLLs at the very beginning of the program by calling some initialization or “get version” function in main() or WinMain(). Now, at least my program will abort almost immediately, without loosing the user’s unsaved data.

As an aside, if the application calls functions in a delay-loaded DLL through a function pointer, you may want to initialize the function pointer a second time after having called the function once. When setting the function pointer, the pointer gets the value stored in the redirection table that I mentioned before. However, __delayLoadHelper() patches this table after the first call, making the value in the function pointer obsolete. Unless you re-initialize the function pointer, it keeps pointing to the stub function and every call via the function pointer will be routed through __delayLoadHelper(); it will still work, but it is inefficient. The “global optimizations” of Visual C/C++ can cause the same phenomenon: if you call a function several times in a row, the optimizer assumes that the call address does not change and may store it in a register. The example program in Matt Pietrek’s article demonstrates this flaw when compiled in retail mode. (Matt Pietrek probably only compiled his example program in debug mode, as he does not mention this effect.)

Delay-loading causes the DLL to be extracted and loaded only when used. In some situations, it may also be desirable to _unload_ the DLL after completing a particular process. To explicitly unload a delay-loaded DLL, both Visual C/C++ and C++ Builder provide the function __FUnloadDelayLoadedDLL(). Microsoft Visual C/C++ requires that you add the option /Delay:unload to the linker switches; lacking that option, __FUnloadDelayLoadedDLL() simply returns FALSE. Borland C++ Builder does not require a linker switch for unloading, its linker always creates the correct tables for unload support. The DLL name that you pass to __FUnloadDelayLoadedDLL() must be the original filename of the DLL, not the temporary filename to which the DLL was extracted. It must include the extension, but exclude the path. With Visual C/C++, the name comparison is case sensitive, and the case that you must use is not always the same as how you specified it in the /DELAYLOAD: linker option. To avoid to have to guess the case of the DLL name with Visual C/C++, I make my hook function convert the DLL name to upper case when it loads it. The structure that __delayLoadHelper() passes to the hook function is guaranteed to reside in memory with “write access” because __delayLoadHelper() patches this structure itself too.

After explicitly unloading the DLL, you will also want to get rid of the temporary file. This involves no more than that you close the handle to the file; the DELETE_ON_CLOSE flag will cause its erasure. You have to save the file handle, though. In Listing 1, I used the “map” data structure from the STL to store the file handle associated with each loaded DLL. The function dpUnloadDLL() is a wrapper around __FUnloadDelayLoadedDLL() which then closes the file handle and removes the entry from the map. I copied the functionality of __FUnloadDelayLoadedDLL() to unload (and delete) all delay-loaded DLLs if you pass NULL for its filename parameter.

End Notes

The DLLpack source file needs the DELAYIMP.H file; Microsoft Visual C/C++ 6.0 comes with this file, but C++ Builder 5.0 does not provide it. For that reason, Listing 3 contains the essential type definitions and constants for delay-loading.

Unload support is a feature that is infrequently needed. Therefore, Listing 1 has all code related to unloading in conditionally compiled sections. To remove unload support, compile the source code with DLLPACK_NO_UNLOAD. Other than the “map” data structure from the STL, Listing 1 uses no C++ features. If you remove the unload support, or replace the map data structure by a simple linked list, the code compiles in C.

Not all DLLs are suitable for delay-loading. DLLs with global variables that are declared with “__declspec(thread)” _must_ be loaded before the application starts, so delay-loading does not work. An alternative is to obtain thread-local data with TlsAlloc() at run-time; this approach allows the DLL to be delay-loaded.

The demo program in Listing 2 uses CPUINF32.DLL as an example. This DLL allows you to query the type and capabilities of the processor. At the time of writing this article, the DLL and development files can be downloaded from developer.intel.com/support/processors/procid/cpuid/cpuinfo.htm.

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Packing DLLs in your EXE

Why DLLs?

Why DLLpack?

DLLpack Internals

End Notes

Further Reading

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Packing DLLs in your EXE

Why DLLs?

Why DLLpack?

DLLpack Internals

End Notes

Further Reading

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content