Channels ▼
RSS

.NET

Reducing the Size of .NET Applications

Source Code Accompanies This Article. Download It Now.


March, 2005: Reducing the Size of .NET Applications

Smaller EXEs without native code

Vasian is a Ph.D. candidate in the Darmstadt University of Technology's Software Technology Group. He can be contacted at cepa@informatik.tu-darmstadt.de.


Executable files for the .NET Framework currently cannot be packed by binary file compressors such as UPX (http://upx.sourceforge.net/) because .NET uses customized sections in the Portable Executable (PE) file (which is used by all Windows executable files). The .NET Execution Engine expects Common Language Infrastructure (CLI) data to be in the proper sections of the PE file. However, CLI data is placed in the PE sections uncompressed by default.

In this article, I present a technique for reducing the size of .NET executables without using native code or otherwise modifying the PE format. Instead, I use reflection, which is supported by the .NET Framework, and pack the applications at a higher level.

Reducing the size of applications has several benefits:

  • The disk space required is smaller. While disk space is usually not a problem in desktop computers, it can be in portable devices that run .NET Framework.
  • Smaller executables load faster because of fewer disk accesses. Even if you uncompress the data in memory, RAM access is very fast and compressed executables still load faster than the original uncompressed ones.
  • Using compression combined with, say, in-memory encryption/decryption makes it harder to disassemble .NET applications. This helps protect intellectual property.

The technique I present to compress .NET executables can be used with the main executable (EXE) file of .NET applications and with .NET DLL files that follow the .NET XCOPY paradigm. The technique will not work for DLL files placed in the Global Assembly Cache (GAC), which can be shared system wide or for DLL files shared by more than one application that is not aware of this technique.

The technique does not affect the usual development of .NET applications. The application EXE file and DLL files are built and compiled as usual. The technique can be applied as an additional step after you build the release version. Because of the generality of the solution, it is possible to generalize the technique to work with generic EXE and DLL files written in any .NET front-end language. I have created a tool called .NETZ, which is based on this technique (source code and binaries are available at http://www.st.informatik .tu-darmstadt.de/static/staff/Cepa/tools/netz/index.html and from DDJ; see "Resource Center," page 5). Here, I explain how .NETZ works, giving examples of the most interesting points. This makes it easier to apply customized versions of this technique in .NET applications.

Selecting a Compression Library

To compress the .NET executable data, you need a compression library. I use the open-source #ZipLib (http://www.icsharpcode.net/), which implements various compression algorithms. I use only the usual ZIP format (http://www.pkware.com/products/enterprise/white_papers/appnote.txt) from this library. To compress the data, any third-party ZIP tool will do. For example, pkzip25—add app.zip app.exe can be used in a batch file and pack app.exe as app.zip. The resulting app.zip is about 60 percent smaller on average than the original. .NETZ automates this step and does zipping programmatically using #ZipLib. To unzip the application at runtime, you need access to the unzip code. This means that you have to distribute the zip library with the compressed executable file. The size of #ZipLib is about 115KB. But given that it is open source, you can remove from it support for all compression formats other than ZIP. Moreover, you only need to leave the unzip code. If you do this, the size of the compiled zip.dll you need to distribute with the application becomes about 60KB. This is the only size overhead of this method. For applications larger than 200KB, however, you still gain size when compressing. You can do better by using compression libraries written especially for this technique.

The Starter Application

The heart of the technique is a small starter application (stater.exe), which unpacks the data in memory and starts a packed application. (The source code file starter.cs is also available electronically.) Figure 1 illustrates how the starter application handles .NET EXE files. Keep in mind that the goal here is to create a packed application that, apart from size, is undistinguishable to users from the original application. For this reason, I pack the app.zip data as part of the starter application. The easiest way to do this in .NET is to pack the data as a resource. The resources of .NET applications are packed along with the application in the same physical executable file. Listing One produces a valid resource file. While any name will do, I've named the resource "app.exe" so I can access it later in the starter application. In the starter application at runtime, you first access the packed resource like this:

ResourceManager rm =
new ResourceManager ("app",
Assembly).GetExecutingAssembly()0
(byte[])rm.GetObject("app.exe");

Then you unzip the data in memory:

string zipPath = "app.exe";
MemoryStream zipFile =
new MemoryStream(data);
ZipFile zf = new ZipFile(zipFile);
ZipEntry ze = zf.GetEntry(zipPath);
Stream zs = zf.GetInputStream(ze);
byte[] uzdata = new byte[ze.Size];
sz.Read(uzdata, 0, uzdata.Length);

This code is specific to #ZipLib, and the zipPath value is unique to the ZIP file format. The zip entry path inside the zip file in this example is simply the name of the zipped application. You create a System.IO.MemoryStream object zipFile to pass it to the ZipFile constructor as required by #ZipLib, so to it, the memory data looks like a usual filestream. Finally, the variable uzdata contains the unzipped data.

Once you've uncompressed the application data in memory, you need to activate it. The starter application creates an Assembly from the uzdata byte array, which is invoked by activating its entry point:

Assembly assembly =
Assembly.Load(uzdata);
assembly.EntryPoint.Invoke
(null, new object[]{args});

I used null as Invoke's first argument because the entry point, which corresponds to the Main() method of a C# application, is a static method. As the second argument, I pass the arguments passed to the void Main(string[] args) method of the starter application. This trick lets you transparently pass any command-line argument passed to the starter to the packed application. Consequently, when app.exe is started, it appears that the arguments come directly from the command line. You have to pack the argument as an object array to pass them to the Invoke method.

Alternatively, you can rely on reflection code to find the types in the assembly and invoke methods on them. This is useful when app.exe doesn't have an entry point, or when you want to invoke other methods. The startup time is usually smaller than starting app.exe directly because of lower disk overhead.

To make the code work, you must compile the starter application like this:

csc /t:winexe /out:starter.exe starter.cs
AssemblyInfo.cs
/r:zip.dll /res:app.resources
/win32icon:App.ico

Here, you create a Windows executable. Because I wanted to preserve version information of the original file, I used two additional files—AssemblyInfo.cs and App.ico—which come from the original app.exe. If you have the app.exe source code, you can reuse them; otherwise, you have to write some .NET reflection code to extract the assembly information from app.exe and generate AssemblyInfo.cs in Visual Studio format from it. You can also extract the icon file from the app.exe. .NETZ already contains code to do this automatically. It also compiles the starter application programmatically using the System.CodeDom.Compiler.ICodeCompiler interface with Microsoft.CSharp.CSharpCodeProvider.

You can rename starter.exe back to app.exe later if you like. This way, you distribute starter.exe and zip.dll, which are both smaller in size than app.exe alone.

Handling .NET DLL Files

If the sample app.exe depends on other DLLs, you normally don't need to do anything. At times, however, you may need to also zip the DLL files. Again, the technique I describe here does not work with DLL files placed in GAC, or is shared by more than one application not aware of the technique.

Suppose lib.dll is a DLL required by the app.exe you want to zip. First you would link app.exe with the normal unzipped version of lib.dll as you normally do. .NET has a built-in mechanism for resolving types and assemblies. When it fails, however, you can provide .NET with an assembly. This functionality is exposed by a hook in the System.AppDomain class. In .NET, every application executes in an application domain.

You need to handle this event:

AppDomain currentDomain =
AppDomain.CurrentDomain;
currentDomain.AssemblyResolve += new
ResolveEventHandler
(MyResolveEventHandler);

This code needs to be placed into the Main method of the starter application. The trick for this event to be activated is to place the app.exe assembly activation code in another separate method that is called by the starter's Main method.

Once you zip the lib.dll into lib.zip, then you can also pack it as a resource file with the starter application, as you did with the app.zip. This is preferable if you want to have a single EXE file; otherwise, you can leave it as a separate file. However, you need to rename the file to something different from lib.dll, given that .NET looks for this name, and it appears like a corrupted file to .NET. You can leave the name lib.zip or you can be creative and rename "lib.zip" to "libz.dll." Alternatively, you can save the lib.zip data in a SQL database table and retrieve it from there.

Listing Two is code that activates the DLL in MyResolveEventHandler. For the sake of example, suppose that the zipped DLL is a file in the same directory as the starter application. The types in the DLL are resolved to the AppDomain. The .NETZ tool supports both DLLs packed as resources and as separate files.

Conclusion

I presented here a pure C# technique that uses reflection to compress the size of .NET executables that requires no native code. The method is straightforward to implement and offers lots of possibilities. I have combined all these steps in the .NETZ open-source tool that can be used like this: netz [-b] [-c] [-s] [exe file] [dll files] [-i win32icon], where -b (batch mode) generates a batch file and source; -c is the console exe, the default is winexe; -s (single exe) packs DLLs as resources; and -i win32icon is an optional icon file.

DDJ



Listing One

FileStream fs = new FileStream("app.zip", FileMode.Open, FileAccess.Read);
byte[] data = new byte[fs.Length];
fs.Read(data, 0, data.Length);
fs.Close();

ResourceWriter rm = new ResourceWriter("app.resources");
rm.AddResource("app.exe", data);
rm.Close();
Back to article


Listing Two
public static Assembly MyResolveEventHandler(object sender, 
                                              ResolveEventArgs args)
{
    int i = args.Name.IndexOf(',');
    string dllName = args.Name.Substring(0, i);

    // dllName equals "lib" in the example; mapped to the zipped filename
    dllName += "z.dll";

    // read the file and unzip the data as above code omitted ...
    byte[] uzdata = ...

    return Assembly.Load(uzdata);
}
Back to article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video