Dr. Dobb's | File Recovery Techniques

File Recovery Techniques

Wietse investigates the topic of file recovery by reconstructing past behavior and examining deleted file access time patterns and other attributes.

December 01, 2000
URL:http://www.drdobbs.com/file-recovery-techniques/184404352

Dec00: File Recovery Techniques

Files wanted, dead or alive

Wietse is a researcher at IBM's T.J. Watson Research Center. He can be contacted at [email protected].

Case Study: rootkit

Deleted File Attributes Reveal an Old Vulnerability

About one-and-a-half years ago, a friend's Solaris machine was compromised. The intruder deleted most of the user and system files. To investigate the case, Dan Farmer and I wrote a version of our file recovery tools, which are now part of the Coroner's Toolkit (available at http://www.porcupine.org/forensics/ and http://www.fish.com/forensics/).

With the paint on our tools still wet, we could immediately see which user account was used to access the system. Within days, we stumbled across evidence that the machine had been compromised several times in the preceding 12 months. In the course of the investigation, we recovered a large number of deleted files.

This article is the first of two that explore the subject of file recovery. While the second article focuses on the reconstruction of deleted file contents, this first one deals with reconstruction of past behavior by examining deleted file access time patterns and other deleted file attributes. Specific examples in this article are taken from the UNIX and Linux environments. However, the general principles apply to many operating systems, and differences are mostly in details.

What are your chances of recovering a deleted file? Common wisdom has it that once you delete a file from a UNIX system, your data is lost forever. The UNIX FAQ (ftp://rtfm.mit.edu/pub/faqs/unix-faq/faq/) draws a particularly gloomy picture:

For all intents and purposes, when you delete a file with "rm" it is gone. Once you "rm" a file, the system totally forgets which blocks scattered around the disk were part of your file. Even worse, the blocks from the file you just deleted are going to be the first ones taken and scribbled upon when the system needs more disk space.

This is roughly what we expected to find when we started our investigation. We were aware of Tsutomu Shimumura's successful reconstruction of deleted files after one of his systems was compromised (see Takedown, by Tsutomu Shimumura and John Markoff, Hyperion, 1996). But Tsutomu is not the average investigator, so we did not set our hopes too high.

As we explored the destroyed filesystem, we learned that common wisdom is overly pessimistic. Modern UNIX filesystems do not scatter the contents of a file randomly over the disk. Instead, modern UNIX filesystems are remarkably successful in avoiding file fragmentation, even after years and years of intensive use.

Clearly, file contents with little fragmentation are easier to recover than file contents that are scattered all over the disk. But good filesystem locality has more benefits. It allows deleted information to survive much longer than you would expect.

The Benefits of Good Filesystem Locality

Filesystems with a low degree of fragmentation are good for reading and writing performance, because less time is spent waiting for disk heads to move.

The typical UNIX filesystem achieves locality by breaking up the available space into zones (see Figure 1). Normally, all the information about a small file is stored entirely within one filesystem zone.

Good filesystem locality allows deleted file contents to survive long after a file ceases to exist. Next month, Dan Farmer presents our successes and failures with reconstruction of deleted file contents, so I will resist the temptation to spoil his plot.

Good filesystem locality also allows deleted file access time patterns and other attributes to survive long after a file is deleted. That is the main topic of this article.

Mactimes, Dead or Alive

Last access time patterns for existing files offer great insight into past system behavior. In "What Are Mactimes?" (DDJ, October 2000), Dan Farmer introduced the Mactime program, which reveals file access time patterns according to access method and according to last access time. Mactime reports are very useful for reconstructing past system behavior.

The Mactime report in Figure 2 shows the access time patterns for compiling a "hello world" C program on a Linux machine. Access time information for compiler-related files is not shown. The report shows what you would find on a system where the compiler is used frequently.

At 16:00:14, the file hello.c is created. At 16:00:21, the file is compiled, resulting in an executable file named "hello." In the Mactime report, the hello.c source file is shown after the hello executable file, although the source was used before the executable was created. The file misordering happens because both files have identical timestamps.

Now, imagine that the "hello world" C program was an exploit for some security hole, and that you are a system administrator who suspects that something evil happened around 16:00. Surely, any intruder worth his or her salt would remove the source code and exploit program after they had served their purpose.

As the result of such cleanup activity, a file access time report would reveal only one record: Something happened in Wietse's directory at 16:00:21. However, not all hope is lost. Let's have a look at file access time reports for deleted files.

What Happens When a File Is Deleted?

Deleting a file has a directly visible effect: The file name disappears from directory listings. What happens under the hood depends on filesystem internals. Some filesystems (DOS, Windows) mark the file as ready for deletion simply by hiding the file name in a special manner. This approach makes file recovery easy, but it also handicaps storage allocation for new files. UNIX filesystems strike a different balance. They favor good performance over easy recovery.

The UNIX filesystem has a clever but elegant architecture that has survived 30 years without change. Figure 3 gives a simplified overview. When a file is deleted, UNIX makes only minimal changes to the filesystem. Table 1 summarizes what information is typically preserved and what information is destroyed when a file is deleted:

The directory entry with the file name is marked as unused, so that the file name becomes disconnected from any file information. Names of deleted files can still be found by examining a directory with the strings command. Unfortunately, Linux does not allow directories to be accessed in this manner. To work around this restriction, one can use the icat utility (copy file by inode number) from the Coroner's Toolkit.
The inode (file attribute) block is marked as unused in its block allocation bitmap. Some file attribute information is destroyed (see Table 1), but a lot of information is preserved. In particular, Linux preserves the connections between the file inode block and the first 12 file data blocks. To recover deleted files up to 12 blocks in size, the earlier mentioned icat utility comes in handy again.
File data blocks are marked as unused in their block allocation bitmap, but their contents are left alone. The Linux default filesystem has an option to erase file-data blocks upon file deletion, but that feature is unimplemented. As a rule, file-data blocks are no longer connected with the file in any way, except on Linux, where the first 12 data blocks remain connected to the inode block.

Return to Mactimes, Dead or Alive

Back to the original problem: A system administrator suspects that something evil happened. The only visible evidence is the last modification time of Wietse's directory. Perhaps someone compiled and ran an exploit program and deleted the evidence. What kind of information can you expect to find for the deleted files?

Let's explore what happens when a "hello world" C program is compiled, executed, and deleted. As the compiler processes the source code, it creates several temporary files before the executable program pops out. Thus, when someone compiles, runs, and deletes an exploit program, we expect to find traces of the deleted program source file, of the deleted executable file, as well as traces of compiler temporary files.

The Mactime report (Figure 4) shows the combined last access time patterns for existing and deleted files after such a fictitious "hello world" exploit. The deleted file attributes were retrieved with the ils utility from the Coroner's toolkit. The report looks strange because deleted files have no names. Instead of file names, the report uses disk names and file inode numbers. As before, last access time information for compiler-related files is not shown.

In real life, compiler temporary files are unlikely to show up in deleted file access time patterns. These files live in the same filesystem zone as the /tmp directory (see Figure 1). On typical UNIX filesystems, deleted information in the /tmp filesystem zone is overwritten as soon as a process needs to create a temporary file.

What We Can Learn From Deleted File Access Times

The combined access time patterns for existing and deleted files (Figure 4) give better insight into past system behavior than the one-line report that only showed something happened in Wietse's directory (Figure 2). However, deleted file attributes provide only part of the puzzle. Putting on the hat of an investigator we can see that:

An 85-byte file A was written.
A 4173-byte executable file B was written. It is unknown whether a compiler was involved with this step.
The executable file B was run and/or read (copied).
Both files A and B were removed. This coincides with a change in the contents of Wietse's directory.
It is unknown whether other files were involved with the incident.

And that is about all you can find out on the basis of the available information. Bear in mind that the information is likely to be incomplete, and that none of the information can be trusted unless there is a sufficient amount of evidence that is consistent with it.

Persistence of Deleted Information

You have seen how valuable the attributes from deleted files can be for the reconstruction of past activity. What are the odds of finding such information in the first place? How long can deleted information persist before it is destroyed?

Thanks to the locality properties of UNIX filesystems (Figure 1), deleted information can survive for much longer than I ever had expected. Table 2 shows that deleted inode (file attribute) blocks can survive for hundreds of days, even on systems that are heavily used every day. In both examples, the history of deleted files goes back until the time the system was installed.

Volatility versus Persistence

As illustrated in Dan Farmer's "What Are Mactimes?" article, the last access time patterns for existing files offer insight into past system behavior. However, last access time patterns for existing files suffer from a major drawback: Access times are updated whenever a file is accessed in one way or another. The result is that last file access time patterns are relatively volatile. For example, on a system where the C compiler is used frequently, last access time patterns for compiler-related files are unlikely to persist for any length of time. Last access times for existing files are like footsteps in sand. The next time you look, they have changed.

Deleted files are notably different in this respect. When a file is deleted, all its information becomes frozen in time. And although deleted information no longer has a legitimate existence, it can survive for hundreds of days beyond the time of deletion, as we have seen in the preceding section. Thus, last file access time patterns can survive longer for deleted files than for their living counterparts. Last access times for deleted files are like fossils. A skeleton may be missing a bone here or there, but the fossil record goes back a long way.

Is There Hope for Escape?

Deleted information can survive in large quantities and for extended periods of time. The implications of this are manifold. Not only can investigators find traces of an intrusion long after the fact, the same properties allow nosy employers to spy on past activities of employees, and allow evil government officials to find cleartext copies of incriminating material that was thought to be encrypted and overwritten long ago.

Wiping deleted information can help to protect that information from prying eyes. Unfortunately, wiping deleted information can be hard to automate:

Data can be recovered from magnetic disks even after it is overwritten multiple times, as we discussed in "Forensic Computer Analysis: An Introduction" (DDJ, September 2000). However, overwritten data can be recovered only with physical access to the disk.
Disk wiping applications that allocate and overwrite all unused space can be used only when the system is idle. As an additional complication, creating one large file is not sufficient to allocate all fragments of unused space in a typical UNIX filesystem.
Disk wiping applications that update disks directly, bypassing the filesystem, cannot be used safely with filesystems that are mounted read-write. Because of this restriction, low-level disk wiping applications can be run only infrequently; for example, when a system shuts down or when it boots up.
Kernel-based disk wiping software can overwrite information as soon as it is deleted, but even an operating-system kernel cannot guarantee that deleted data will be overwritten before the machine is halted.

Because disk wiping is so problematic, it is better to simply forget deleted information. This is especially easy with encrypted data. To forget the contents of a deleted file or group of files, simply discard the corresponding cryptographic key. For best results, use software that encrypts all files, including the swap file.

Conclusion

This column has only scratched the surface. We looked at the filesystem from a user perspective; that is, the world of files and directories. We also looked at the next layer down, the world of data blocks, inode blocks, and allocation bitmaps. We found that data deleted from the user layer is still accessible at the next layer down, and that deleted data can be surprisingly persistent.

The pattern of deletion, accessibility, and persistence repeats at lower levels in the hierarchy of disk blocks, disk sectors, and bit patterns on magnetic media. However, as you descend to lower levels, you lose context, and the data that survives destruction becomes increasingly ambiguous.

I've not discussed here the odd places where data can survive, such as the unused portions of the last data blocks of files. Nor have I addressed the possibilities of intentionally hiding information at disk partition boundaries, in bad disk blocks that aren't really bad, or even in plain sight: Data that is stored in disguise, as a wolf in sheep's skin. Covering all this would take more time and space than is available.

Next month, Dan Farmer will describe how we recovered a large portion of our friend's thrashed files, and the surprises and challenges that we had to deal with.

DDJ

Dec00: File Recovery Techniques

Figure 1: On-disk layout of a typical UNIX filesystem. Storage space is divided into multiple zones. Each zone contains its own allocation bitmaps, file data blocks, and file attribute (inode) blocks. Normally, information about a small file is stored entirely within one zone. The figure is not drawn to scale.

Dec00: File Recovery Techniques


  Aug 04 16:00:14
      85 m.c -rw-r--r-- wietse /home/wietse/hello.c (create source file)
  Aug 04 16:00:21
    1024 m.. drwxr-xr-x wietse /home/wietse
    4173 mac -rwxr-xr-x wietse /home/wietse/hello   (create executable)
      85 .a. -rw-r--r-- wietse /home/wietse/hello.c (read source file)

Figure 2: File access time patterns for compiling a "hello world" C program. The line in bold shows what information is left behind when the source and executable files are deleted. Access methods are indicated with m (write access), a (read or execute access), and c (attribute change).

Dec00: File Recovery Techniques

Figure 3: Simplified overview of a typical UNIX filesystem that shows the relationship between UNIX directory entries, inode (file attribute) blocks, and file data blocks.

Dec00: File Recovery Techniques


Aug 04 16:13:08
    85 m.. -rw-r--r-- wietse <hda6-311549> (create source file)
Aug 04 16:13:16
 10897 mac -rw-r--r-- wietse <hda1-2022>   (compiler temp file)
   301 mac -rw-r--r-- wietse <hda1-2023>   (compiler temp file)
   872 mac -rw-r--r-- wietse <hda1-2024>   (compiler temp file)
    85 .a. -rw-r--r-- wietse <hda6-311549> (read source file)
  4173 m.. -rwxr-xr-x wietse <hda6-311550> (create executable)
Aug 04 16:13:22
  4173 .a. -rwxr-xr-x wietse <hda6-311550> (run executable)
Aug 04 16:13:28
  1024 m.. drwxr-xr-x wietse /home/wietse                       
    85 ..c -rw-r--r-- wietse <hda6-311549> (delete source file) 
  4173 ..c -rwxr-xr-x wietse <hda6-311550> (delete executable)

Figure 4: Access time patterns for existing and deleted files after creating, compiling, running, and removing a "hello world" C program in Wietse's directory. Deleted files are represented by disk name and by file inode numbers. Only information in bold is likely to survive for an appreciable amount of time. Access methods are indicated with m (write access), a (read or execute access), and c (attribute change).

Back to Article | Sidebar

Dec00: Case Study: rootkit

Case Study: rootkit

To find out how robust deleted file information can be, I set up a disposable RedHat 5.2 Linux machine and downloaded Version 4 of the Linux rootkit source distribution. The rootkit software produces a network password sniffer program and replaces over a dozen system utilities with modified versions that either reveal intruder activity or provide intruder backdoors. I compiled, installed, and removed the rootkit software, just like an intruder.

Then I did just about the worst possible thing. I downloaded the Coroner's Toolkit source distribution, unpacked it in the same directory as used by the "intruder," compiled it, and ran the software. (To avoid data destruction as described here, we intend to make ready-to-run CD-ROM images available.)

Using the Coroner's Toolkit in this manner, I knowingly destroyed large amounts of information by overwriting deleted files and obliterating file access time information for compiler-related files and for other files.

Even after all that destruction, the Coroner's Toolkit still found useful information. Access time patterns of deleted files revealed that at least 460 files and directories were created and deleted within a relatively short amount of time. At least 300 of those files had practically identical last modification times on November 23, 1998, the apparent time when Linux rootkit Version 4 was prepared for distribution.

The signatures from deleted file attributes were so strong because intruder software suffers from bloat just like any other software. Linux rootkit Version 4 has a rather large footprint of approximately 780 files and directories, including compiler output files. A footprint that large is hard to overlook, even in deleted file access time patterns.

-- W.V.

Dec00: Deleted File Attributes Reveal an Old Vulnerability

Deleted File Attributes Reveal an Old Vulnerability

Whenever you explore unusual sources of information, you run into odd things. The study of deleted file attributes is no exception. The "dead or alive" file access time patterns in Figure 4 reveal a privacy problem in an old version of the gcc compiler: The three compiler temporary files are created with world read permission. Thus, for a brief instant of time, any user can have read access to someone else's compiler temporary files, even when that user has no access at all to the program source code or to the resulting executable file. A quick experiment with a newer gcc version shows that the privacy problem has been fixed in the mean time.

-- W.V.

Dec00: File Recovery Techniques

Table 1: The effect of file deletion on file names, on file attributes, and on file contents (for typical UNIX filesystems).

Dec00: File Recovery Techniques

Table 2: Number of surviving inode (file attribute) blocks as a function of time since file deletion for UNIX home directory and system partitions. hades.porcupine.org is Wietse Venema's FreeBSD workstation. flying.fish.com is Dan Farmer's RedHat Linux workstation. In both cases, the history of deleted file attributes goes back until the time when the filesystems were created.