Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

.NET

Making .NET Assemblies Tamper Resistant


July, 2004: Making .NET Assemblies Tamper Resistant

A strong name for a .NET assembly wards off intruders

Richard is a consultant on .NET technologies. He can be contacted at dotnet.devrichardgrimes.com.


Trojans causing havoc usually spread through e-mail, relying on users inadvertently executing attachments. Such malware uses the e-mail to hide itself, although often not very well. Some malware is more devious and comes to your machine by infecting files you trust. Users often"share" applications through peer-to-peer file-sharing systems, creating an open opportunity for attackers to post a well-known application with viruses attached. Executing such applications runs viruses that can replicate by searching for similar files on your hard disk and attaching to new files to be shared with other people. Such infections work because it is possible to change application files. In this article, I go into the .NET file structure and show you how .NET prevents such alterations from being performed on your .NET assemblies.

Hashes

As a reaction to virus infections, some software publishers provide a message digest for each file. A digest is a one-way hash of the contents of the file. This generates a large number (128 bits for MD5 or 160 bits for SHA) that is essentially unique to the file. It is possible (although unlikely) that you could create the same hash for two files. However, finding two files that differ only in the rogue code provided by virus writers is extremely unlikely. In any case, testing every possible combination of bits to see if it generates the same hash value would not be feasible. Since the hash function is one way, it is not possible to deduce the possible combinations of bits that could generate the hash. Thus, you can regard it as being impossible to use brute force to determine how to add rogue code to a file in such a way that the same hash is generated.

Using a hash is straightforward. The publisher generates a hash for the software and makes this available with the code, perhaps by publishing it on a web site. A customer downloads the software and generates a hash to compare with the value provided by the publisher. If the two are the same, the file is safe to use. The .NET framework provides managed access to hash algorithms implemented either via the Windows CryptoAPI or fully managed algorithms. The MD5 class from the System.Security.Cryptography namespace gives access to an MD5 hash routine, whereas SHA1 gives access to the 160-bit SHA-1 algorithm. Both are currently implemented by the CryptoAPI and accessed through Platform Invoke. The namespace also provides the SHA256, SHA384, and SHA512 classes that give access to 256-, 384-, and 512-bit SHA hashes, all of which are completely implemented in managed code.

Example 1 shows how to hash an array of bytes using the framework cryptography namespace. Here, I use the MD5 class that returns an instance of the MD5CryptoServiceProvider class, but this is an implementation detail because, in the future, Microsoft may provide a managed version of the MD5 algorithm. The ComputeHash method is overloaded to take a byte array or stream; if the stream overload is used, the method hashes the stream in chunks of 1024 bytes until the entire stream has been read.

Of course, the weak point in this scheme is publishing the hash because it may be that attackers could generate a hash for an infected version and publish that hash. If attackers have sufficient skills, they could even crack the publisher's web site and exchange the real hash with the attacker's hash. Hashes are typically published as strings of hex, so few people would recognize it if a published hash has changed. One solution could be to encrypt the hash so that only those people who are entrusted to the key can publish it. The framework provides two classes that derive from KeyedHashAlgorithm that generate a hash, then encrypt it with the key passed to the constructor. However, these use symmetric algorithms to encrypt the hash that means that the same key is used to encrypt and decrypt the value; hence, the publishers have to publish the "secret" key they used to make the hash "secure." Indeed, the KeyedHashAlgorithm classes are only intended to be used to authenticate data passed between two users that share a secret key.

The solution is to use an asymmetric algorithm. Here, publishers generate the hash and encrypt this with their private key. This is often called a "signed hash." This value and the publisher's public key are published and only the public key is able to decrypt the signed hash. This means that, assuming the publisher's private key is kept private and that the publisher's public key is well known, attackers cannot generate a signed hash for their adapted file.

Strong Names

The key to securing a .NET assembly is a strong name. There are several aspects to a strong name. For example, .NET will only respect a library's versioning if it has a strong name and only strong-named assemblies can be put in the Global Assembly Cache and be shared by different applications. In this article, I only address one aspect—the code signing that occurs when you give an assembly a strong name.

The way to sign an assembly is to get the compiler to do the work by supplying the [AssemblyKeyFile] pseudocustom attribute to provide the name of a file that contains the public-private key pair for your company. This key pair is privileged information because it contains the private key that should only be used by trusted personnel, so Microsoft provides a mechanism called "delayed signing" (using the [AssemblyDelaySign] pseudocustom attribute), which informs the compiler that the assembly will be signed at a later date. The compiler creates the required space in the assembly for the signing information but does not initialize this with relevant information. At a later date, the assembly can be signed with the strong-name utility, sn.exe and the -R switch. This signs an assembly that originally had the [AssemblyDelaySign] attribute.

The sn.exe utility does not perform the signing. It merely provides command-line parsing and output code for the strong-name functions exported from the mscorsn.dll library. The prototypes of these functions can be found in the strongname.h header file and import library mscorsn.lib in the Tool Developers Guide in the .NET SDK. The Shared Source CLI (also known as "Rotor") gives the source code for a version of this DLL. This code shows that signing an assembly is straightforward and to understand how this process works, you need to understand the format of a Portable Executable (PE) file.

Portable Executable Files

Every EXE and DLL on Windows is a PE file. Figure 1 illustrates the PE format. At the beginning of the file is the MS-DOS stub, which contains an IMAGE_DOS_HEADER and message that indicates that the file cannot be executed under DOS. One member of this header contains a relative address to the PE file header, which is a structure called IMAGE_NT_HEADERS. This contains a signature (the two bytes 0x4550, the characters PE) and the COFF header (IMAGE_FILE_HEADER) followed by the PE header (IMAGE_OPTIONAL_HEADER). These headers contain important information about the file. The actual contents of the file are contained in sections that can contain code or data (readable and/or writeable). When the file is memory mapped into memory, the operating system uses this information to determine what part of memory is data and can be modified, and what is code and can be executed. The COFF header identifies how many sections the file contains. The PE header contains much more information, which includes the address of the unmanaged entry point, the location and size of the code, and location and size of data. Clearly, there is enough here for attackers to inject code by altering the values in these locations.

On operating systems before XP, the operating system doesn't treat an assembly any differently than other PE files. Once the OS loads the PE file, it runs the unmanaged entry point function. In an assembly, this unmanaged entry point runs the appropriate entry point (_CorExeMain or _CorDllMain) in the .NET execution engine library, mscoree.dll. This entry point starts up the .NET runtime (if it has not already started), then locates and executes the managed entry point in the assembly. The unmanaged entry point is not executed by XP or later versions of Windows because they know how to identify .NET assemblies (and so start the runtime automatically) and know how to locate the managed entry point without using mscoree.dll.

The PE header contains a member called the "data directory" that contains at least 16 instances of IMAGE_DATA_DIRECTORY (currently, compilers only emit 16 instances). These entries give information about the location and size of various data tables in the PE file. Again, this information is important and, if tampered with, could let attackers change how the assembly works. Such information are the import table (such as the unmanaged functions the assembly uses and the DLLs that contains them), unmanaged resources, and a location in the file where authenticode certificates are stored. The 14th entry in this table is the Common Language Runtime Header (IMAGE_COR20_HEADER) that contains information about the location of the assembly metadata and managed resources. Again, this information is important. The metadata tables indicate the assemblies and the types that are imported and so if this data could be changed, a different assembly, possibly containing the attacker's code, could be loaded. Similarly, the resources could contain user-interface items, such as strings used on dialog boxes or output strings. If attackers could change these strings, it might be possible to trick users to reveal personal data.

After the data directory is one or more section headers represented by an IMAGE_SECTION_HEADER structure. Each header gives the size and location of the section and its characteristics (whether it contains code or data, whether it is readable and/or writeable). Finally, the PE file contains the sections identified by the section headers.

Protecting the PE File

When you give a library assembly a strong name it is protected from tampering. Through the alink.dll (or the strong-name utility if you sign the assembly), the compiler will call a function called StrongNameSignatureGeneration to generate a signature from the assembly. The signature is a hash of the assembly that is signed with the publisher's private key. This signature is then placed in the location identified by IMAGE_COR20_HEADER.StrongNameSignature.

Clearly, the location used by the strong-name signature should not be used by the signature, nor should any location be used to store certificates because they will be updated after the assembly is created. So the StrongNameSignatureGeneration function uses the following routine to create the hash:

  1. Hash the DOS header including the stub message.
  2. Hash the IMAGE_NT_HEADERS data but exclude the Data Directory entry 4 (the authenticode certificate table) and the checksum of the file.
  3. Hash the section headers.
  4. Use the section headers to locate each section and then hash each one; if the section contains the strong-name signature, then exclude the signature from the hash.

The hash for the entire assembly is a combination of all of the hashes just given. Once the hash has been created, it is signed with the publisher's private key and then the hash is copied into the location indicated by the StrongNameSignature member of the IMAGE_COR20_HEADER data directory entry.

As you can see, the hash is computed over the entire assembly except for the checksum, the authenticode table, and the location that contains the signed hash. When the runtime loads an assembly, it sees that the file has been signed and calls StrongNameSignatureGeneration to generate its own version of the hash. It then looks in the assembly for the publisher's public key. This key was included in the generation of the original hash and, if this value has been tampered with, it is reflected in the runtime-generated hash. The public key is the only key that can decrypt the original hash from the strong-name signature. After the runtime has obtained the original hash, it compares it with the hash it generated from the assembly; if the two are different, it throws a FileLoadException exception and refuses to load the assembly.

Multifile Assemblies

Figure 1 illustrates the format of most assemblies—that is, they consist of a single PE file called a "module." Although this is the most often used configuration (and the one that Microsoft uses for its assemblies), it isn't the only one. Assemblies can be made up of more than one code module, one of which must contain a section called the "manifest" that holds information about the other files in the assembly. An assembly can also contain separate resource files such as graphics files, text files, or compiled .NET resource files. Figure 2 shows an example of a multifile assembly. Such files present an opportunity to let attackers dupe your users because resources in external files could contain text that is shown on a user interface and could persuade your users to enter personal data.

Code modules are a constituent part of an assembly and a strong-name signature is for the entire assembly, not for a separate part of the assembly. For this reason, modules do not contain their own strong-name signature and if you add the [AssemblyKeyFile] attribute to a module that won't contain the manifest the attribute will be ignored. However, when you add a module or an external ("linked") resource to an assembly, the compiler will generate a hash for that new file and add it to the manifest of the assembly. Example 2 shows part of the manifest extracted with ildasm.exe for a library assembly that contains an additional code module and a linked resource. The important point is that, when the assembly is created, the compiler generates a hash for these external files and stores this hash in the metadata table entry for the file. The metadata table is held in the .text section of the module that contains the manifest. This file, including the metadata table, will be hashed to form the strong-name signature.

If the external file is changed at a later stage, then the hash of the new file will not match the hash stored in the metadata table. How this is handled by the runtime depends on the type of file. When an assembly is loaded, the runtime tests each module within the assembly to see if it has been tampered. To do this, the runtime performs a hash of the module and compares this with the hash stored in the assembly's metadata table. If the two hashes do not match, the runtime throws a FileLoadException exception.

If the external file is a resource file, the hash is only checked when the file is loaded. Typically, your assembly uses either Assembly.GetManifestResourceStream or the ResourceManager class to load the resource. However, the ResourceManager class uses GetManifestResourceStream, so the following discussion applies to both. When the runtime tries to load the external resource, it uses the assembly manifest to get the name of the resource file. From this entry, it also gets the hash of the external file. The runtime then locates the file and performs a hash. If the resource file has changed, these two hashes will not match and the runtime will not load the file. However, no exception is thrown. Instead, GetManifestResourceStream returns null. I consider this a bug because the resource is not loaded because of a clear breach of security. Indeed, I think that in both cases, when a code module or a resource file fails to generate the hash in the manifest, the runtime should throw a security exception. However, this is not a security vulnerability because under no circumstances will the tampered file be loaded.

Satellite Assemblies

Another area where attackers might try to tamper with files is with satellite assemblies—assembly files that only contain resources. An assembly that uses satellite assemblies provides the neutral culture resources, whereas each satellite contains the resource for a specific culture. The culture of the satellite is part of the satellite name. Typically, a satellite is built using the al.exe assembly linker tool and the culture is applied using the /culture switch. Every satellite assembly has the same name except for the "culture" part of the name. Private assemblies have to be stored in separate sub folders named after the culture because NTFS and FAT32 treat each satellite as having the same name. The short part of the assembly name is in the form <main-assembly>.resources, where <main-assembly> is the short name of the assembly that contains the neutral resources. However, if this main assembly has a strong name, then the satellites must also have a strong name, which means that satellite assemblies are signed. This, in turn, means that each satellite contains a strong-name signature that is a hash of the assembly.

If the satellite contains embedded resources then the hash includes a hash of the embedded resources. If the satellite contains a linked resource, then the satellite contains a hash of each linked resource file in the metadata table. This table is hashed as part of the strong-name signature, so in all cases the resource is protected. Thus, if a linked resource has been tampered with and its hash does not match the hash in the satellite metadata table, or if the hash of the entire assembly does not match the signed hash, the entire satellite assembly will not be loaded, but no exception is thrown. The ResourceManager handles this via a process called "fall-back"—it tries to load another, similar resource. So if the UI culture is US English and the en-US satellite has been tampered with, the ResourceManager rejects that satellite and tries to load the general English resource. If the general English resource is not available, the ResourceManager tries to load the neutral resource. Again, I would prefer the runtime to throw a security exception because, clearly, a security breach has occurred. However, this is not a security vulnerability because the tampered assembly will not be loaded.

Signing the Assembly

The protection of the signed hash is provided simply by telling the compiler to sign the assembly by including the [AssemblyKeyFile] attribute in the assembly. You do not need to do anything else. However, if you look up strong names in the MSDN library, you get the impression that strong names are usually associated with shared assemblies; that is, assemblies that should be installed in the Global Assembly Cache and made available to all applications. However, if you provide code to users outside your organization, it is important that you sign your assemblies, regardless of whether the assembly is installed in the GAC or is installed in the application folder as a private assembly. If you distribute an assembly that is not signed, then an attacker can alter your library to do something nasty and then redistribute your application through a warez site or peer-to-peer file-sharing system. If this tampered version of your application does something nasty to someone's machine, you'll be blamed instead of attackers.

The simple action of giving your library assemblies a strong name protects your assemblies—their code and resources—from being tampered with by attackers. It's easy to do, so make sure it gets done.

DDJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.