Wietse is a researcher at IBM's T.J. Watson Research Center. He can be contacted at firstname.lastname@example.org.
This article is based on true events that happened a couple years ago when I was working at Eindhoven University in the Netherlands. The story is about analyzing an unknown program that was left behind by an intruder. The fact that the computer systems involved were running UNIX is only of marginal importance.
One morning I came into my office and noticed a strange message in my workstation's window [Figure 1(a)]. Apparently, someone had compromised the screen saver's login account on a neighboring workstation, and had used the "finger" network service to find out who was logged into my workstation at 11:00 in the evening. This finger request was dutifully logged by my TCP Wrapper program.
Closer investigation revealed that this finger request was not an event by itself. It was preceded by a most unusual chain of finger connections [Figure 1(b)]. Apparently, someone had made a finger connection from the machine called "wsbs03," through the machine called "wsbs01," and through the loopback interface of my machine called "wsbs06," where the intruder ran into a TCP Wrapper "deny" rule.
This was not just anyone. This was someone who had acquired super-user privileges on the machine wsbs03. This was trouble.
A Privileged Backdoor Process
Looking around on the compromised machine revealed that the intruder had left behind one process that was running in the background. As Figure 2(a) shows, the process was running with super-user privileges, was started at 23:02, had used no CPU time, and had a misleading process name of <defunct>.
To find out more, I used the lsof command (list open files) by Vic Abell (ftp:// vic.cc.purdue.edu/pub/tools/lsof/), which shows what files a process executes, what files a process accesses, what network connections a process uses, the current directory, and so on. For each file, the lsof command lists both the internal inode number and the name of the filesystem from which the file originates; see Figure 2(b).
The intruder had left behind a process running with super-user privileges that listened for incoming connections on TCP port 5120. This process looked like a privileged backdoor into the system -- the kind of problem that could not be left unattended for too long.
The intruder had been working late at night and was unlikely to return in the next couple of hours, so I had a few hours to figure out what kind of process the intruder had left behind. I searched the filesystem for all files listed in the lsof output. The system run-time libraries showed up quickly. However, the executable file itself, with internal inode number 93828, had been deleted. Great, so I had no program file to look at. This was not a good start.
The Treachery of Images
The previous section may have given the impression that intrusions are easy to investigate. You go into a machine and run a couple commands to detect strange processes or files. It is not necessarily that easy. When a machine has been compromised, all information that comes from the machine must be treated with extreme suspicion. The cleaner a machine appears to be, the more suspicion it deserves.
About 70 years ago, René Magritte (http:// www.magritte.com/) made a series of paintings that dealt with the treachery of images. One of those paintings shows an image of a pipe. Below the pipe is text that reads "Ceci n'est pas une pipe." This is not a pipe -- it's an image of a pipe. The image could be an artist's rendering of a real pipe. It could also be completely made up by the artist. You can't tell the difference just by looking at the image.
Computers are subject to the treachery of images as well. The image on your computer screen is not a computer file -- it's only an image on a computer screen. Images of files, processes, and network connections are very distant cousins of the actual bits in memory, in network packets, or on disks. The images that you see are produced by layer upon layer of hardware and software. When an intruder "owns" a machine, any of those layers could be tampered with. Application software can lie, OS kernels can lie, boot PROMs can lie, even hard disk drives can lie.
Nowadays, intruders routinely replace system utilities such as "ls" (show files), "ps" (show processes), and "netstat" (show network connections) with versions that are modified to hide the presence of backdoor programs and/or other intruder-related materials.
Modifications to application program and data files can be detected relatively easily by comparing the files on the system against a known-to-be-good baseline. Host security-checking software such as the Tiger (originally by Douglas Schales, ftp://net.tamu.edu/pub/security/TAMU/; this site does not work with many web browsers because it does not support passive FTP) uses a database with strong cryptographic hashes that were computed for files on original system distribution media. Change detection software such as Tripwire (http://www.tripwire.com/) uses a database with strong cryptographic hashes that were computed when the files on the machine were known to be authentic.
Meanwhile, toolkits are emerging for popular UNIX versions that achieve stealth effects by modifying a running OS kernel on-the-fly (see, for instance, The Hacker's Choice, http://www.infowar .co .uk/thc/ and http://thc.pimmel.com/). Kernel-level modifications can be much harder to detect than application-level modifications, because the kernel is the mediator for all questions that we ask about the machine. When operating-system kernel modifications become sufficiently sophisticated, then we may have to address the bare hardware-level with crocodile clamps and logic analyzers if we want to find out what the heck is going on.
In the case of our own little intrusion, we were relatively lucky. The intruder broke into a diskless machine with read-only access to the system software. Changes to system software would have to be made on the file server, not on the client. The intruder made no attempts to access the file server; such attempts would have set off several alarms that I had planted in the past.
At the time of the incident, it was not yet customary to make on-the-fly changes to running processes or to OS kernels, so I did not worry about that possibility.
For readers who've tuned in late, we're confronted with an unknown program that is running on a compromised machine. The executable file is deleted, so we cannot easily find out the nature of the program. The process runs with super-user privileges, so it can potentially do a great deal of damage. Finally, the process listens on network port 5120. In other words, we're looking at a privileged backdoor of some kind. Presumably, we don't want to sit and wait for the intruder to make use of the backdoor.
What can we do? Several possible approaches come to mind. First of all, we can simply terminate the backdoor process and lose all information about it. This lets us go back to work with the least amount of effort.
Another possibility is to connect to port 5120, start banging away at the port with random data, and see what happens. This is definitely a bad idea. For all we know, this process could destroy all information on the machine, either by accident or by way of retaliation. Or the intruder process could simply commit suicide and disappear, and we would be none the wiser.
A third possibility is to freeze the process and conduct any further investigations at leisure. This is the approach with the best learning opportunity. Being the curious person that I am, I typed "kill -STOP 12823" to suspend the intruder's process and relaxed.
The UNIX kill command name is misleading: Only some incantations of the kill command actually terminate the target process. "kill -STOP" suspends the target process immediately and unconditionally. The process can still be resumed with "kill -CONT" as if nothing happened.
There are many ways to study a program's behavior. With static analysis, one studies a program without actually executing it. Tools of the trade are disassemblers, decompilers, source-code analysis tools, and even such basic tools as "strings" and "grep." Static analysis has an advantage in that it can reveal how a program would behave under unusual conditions. In real life, static analysis gives an approximate picture at best. According to current insights, it is impossible to fully predict the behavior of any nontrivial program.
With dynamic analysis, you study a program as it executes. Tools of the trade are debuggers, function call tracers, machine emulators, logic analyzers, and sometimes even network sniffers. The advantage of dynamic analysis is that it can be fast and accurate. However, dynamic analysis has the disadvantage that "what you see is all you get." It is difficult to impossible to make a nontrivial program traverse all the possible paths through its code.
A special case is "black box" analysis, which is dynamic analysis without access to program internals. In this case, the only observables are the external inputs, outputs, and their timing characteristics. In some cases, the inputs and outputs include power consumption and electromagnetic radiation as well. As we will see in a forthcoming example, black box analysis in software can yield useful results despite its apparent limitations.
Finally, there is postmortem analysis, the study of program behavior by looking at the after-effects of program execution. Postmortem analysis is often the only tool available after system intrusion. Some information disappears quickly as normal system behavior erodes away the evidence; other information can persist for days or even weeks.
Recovering the Program Code and Data
With the intruder's backdoor process left in a state of suspended animation, my next goal was to recover a copy of the program code and data, so that I could figure out the program's purpose.
Recovering program code and data from a running process can be easy or difficult, depending on the operating system involved. Older UNIX systems offer little more than the traditional ptrace() debugging hooks. These hooks give access to a process in a manner that is comparable to eating a hamburger through a straw. It is painful, but it is sometimes your only option.
Modern UNIX systems have a /proc filesystem that makes process information available in a much more convenient manner, including the executable file, current directory, and process memory. The information is made accessible as /proc/ pid/filename, where pid identifies the process, and filename specifies the process attribute. Table 1 gives a few examples of attributes and of their corresponding filenames.
At the time of the intrusion, the /proc filesystem was not as widely implemented on UNIX as it is nowadays. The best alternative was to use gcore -- a standard utility program that takes a snapshot of the process data and stack but not of the program code. The output from gcore is in the form of a core dump file -- the kind of file that UNIX normally produces for postmortem analysis of faulty software.
Years later, Dan Farmer and I built a collection of tools for forensic analysis that would have made my work much easier. This software is now available as The Coroner's Toolkit (TCT, http://www.fish.com/ forensics/ and http://www.porcupine.org/ forensics/). Three tools from the TCT would have been especially useful for me at the time:
- pcat. Copies process memory to a file, including code, data, and stack. With this tool, I could have recovered the program code and not just the data and stack portions that I got from running gcore.
- icat. Copies a file by its internal inode number instead of by its file name. This tool is especially useful for recovering deleted files that still exist as long as some process has access to it. With the icat tool, I could have recovered the complete executable file, including the program code and the internal and external compiler symbol tables.
- ils. Lists file attributes by internal inode number instead of file name. The inode number can be found in, for example, output from the lsof command (see the first part of this article). With the ils tool, I could have recovered the executable file's owner, the last time the file was changed, the last time the file was used, the last time the file was removed from its directory, and so on.
Static Program Analysis
In the previous section, gcore gave me a snapshot of the intruder program's data and stack, but not of the program's code. Without a copy of the program code, how was I ever going to find out the purpose of the program?
It is with problems like this, where such simple tools as the UNIX strings command really shine. UNIX acquired its share of "cruft" in the course of time, but it still has a number of utilities that each implement a concept well and that are easily combined into other concepts. I like programming with concepts. (That's probably why I feel more comfortable programming in the shell language than in Perl.)
Enough evangelizing for now. Running strings on the gcore-triggered core dump revealed a lot of text that instantly identified the backdoor program as a descendant of the standard BSD telnet daemon. The similarity was so strong that I didn't look at the strings output itself, I looked at the much more informative differences with strings output from a real BSD telnet daemon.
Lo and behold, in these differences were a few strings that revealed the true nature of the backdoor program:
References to the UNIX login program had been replaced by references to a UNIX command interpreter. Among the differences was also the particularly distinguishing text message:
(Exercise: Enter "telcli: socket" into any web search engine, put quotes around the text, and see how many source-code files it finds.)
So there it was. The purpose of the intruder's backdoor program was to bypass the system login procedure and give access to a privileged interactive command interpreter. Because of the possible risks, there was no point in resuming the suspended backdoor program. I left the backdoor process in its suspended state just in case I had a bright idea. Meanwhile, I cleaned up.
I could have replaced the backdoor by a harmless program that just recorded everything without actually executing the intruder's commands. I had done that before, but I found that the result was quite disappointing. If you do this, you have to be aware of the possibility that the intruder will be annoyed and will retaliate.
A few years later, I realized that I had overlooked an artifact of NFS, the network filesystem that connected our workstations and servers. Although the backdoor's executable file was deleted on the diskless machine, the file would still have existed on the file server for as long as the backdoor process existed.
In hindsight, it was probably a good thing that I did not have a copy of the program instructions, but just a snapshot of the data and stack from a running process. Even though program debuggers can produce very usable disassembly listings with human-readable names for external function calls and such, I could have wasted massive amounts of time trying to make sense out of page after page of assembly code, even when it was straightforward SPARC assembly code.
At the time, automatic decompilation into a high-level language such as C was a dream at best. Even nowadays, C decompilation tools exist only for limited environments (for instance, the DCC retargetable decompiler by Cristina Cifuentes, http://archive.csee.uq.edu.au/~csmweb/ dcc.html). Concerns about intellectual-property theft may have a lot to do with limited availability. The threat of reverse engineering also presents an interesting problem to Java programmers, because compiled Java code contains so much additional information that very good decompilers have already been developed.
Running an Unknown Program
With this particular intrusion, a static analysis of the intruder's backdoor process gave me all the information that I needed to determine the nature of the program. However, for the sake of completeness, I'll say a few words on dynamic program analysis.
One way to find out the purpose of an unknown program is to simply run it and see what happens. There are lots of problems with this approach. The program could run amuck and destroy all information on the machine. That would be a short but intense experience. Or the program could send threatening e-mail to president@ whitehouse.gov or to other people you don't want to upset. That experience could last for the rest of your career.
Rather than running an unknown program in an environment where it can do damage, you could run the program on a disposable machine without network access. If the software is of the Intel persuasion, you could even consider running it inside a virtual machine (VM) sandbox. VMs are great for research, especially when they have support for undoable filesystem changes, such as VMware (virtual machine monitor host software for Linux and Windows NT, http://www.vmware .com/). This way, you can run the critter again and again; and each time you do, you can reset the machine to the same initial conditions.
The use of a VM as a sandbox requires that the VM implementation provide perfect insulation. Building a secure VM monitor is a nontrivial exercise (see "A Retrospective on the VAX VMM Security Kernel," by Paul Karger et al., IEEE Transactions on Software Engineering, November 1991). Complications arise when the CPU has instructions that lack VM support, so that they need to be intercepted and emulated in software (see "Analysis of the Intel Pentium's Ability to Support a Secure Virtual Machine Monitor," by John Scott Robin and Cynthia E. Irvine, Proceedings of the 9th USENIX Security Symposium, August 2000).
All this is well and good, but even if we would run the program in a perfectly insulated disposable sandbox, would the result be valid? The program would be running in a different environment from where it was found. If the program had a logic bomb, the bomb might very well go off only under very specific conditions.
Dynamic Program Analysis
Suppose you want to leave an intruder's backdoor program running so that you can monitor its progress in real time. What would you use for instrumentation, and what would the results look like?
You could try to watch the process at the machine instruction level. Under the given conditions, it would not have been practical to arrange for a fast logic analyzer. Monitoring the process at the machine instruction level in software would involve a tracing process that manipulates the traced process via operating-system debugger hooks. All this would let you follow the process in great detail, and that is exactly the problem with this approach.
Passing control back and forth between the traced process and the tracing process after each machine instruction slows down execution by many orders of magnitude; this would not be a problem with a fast logic analyzer. But the greater problem is information overload. Tracing a process at the machine instruction level generates enormous amounts of information. Trying to make sense of all that information in real time would not be practical.
Instead of watching every machine instruction, you could do the opposite and ignore what happens inside a process, effectively treating the process like a black box. This is not such a crazy idea. On real operating systems, a process has no direct access to the world, but is constrained like a prisoner; it is entirely at the mercy of the operating system for all its needs. Every file access, every network access, every interaction with the world requires a system call to request assistance from the operating system.
Modern systems come with tools that make it easy to monitor system calls in real time. On UNIX systems, the commands are called trace, strace, or truss. Some systems even have tools to monitor calls into library routines: Examples of such commands are sotrace and ltrace.
Typically, the output from call-tracing programs looks like one line per call, with the name of the function, its arguments, and its result value. Example 1(a) displays all I/O-related system calls that are made by the Solaris date command after process initialization.
Watching system calls has lots of benefits over watching machine instructions. System calls happen at much lower frequency. Watching system calls causes less slowdown of execution, but more importantly, watching system calls produces less information.
Information about system calls not only has a better signal to noise ratio, it is eminently suitable for filtering on the function call name or on function call arguments. This makes it relatively easy to do nifty things such as wiretapping a running process.
As an illustration of the power of system call tracing, Example 1(b) puts software crocodile clamps on an ssh server process and reveals the cleartext of encrypted network login sessions. This command attaches to the process with ID pid and to any child process that is born after the strace command is started. It displays all data that is read from file descriptor 6 or written to file descriptor 4. In other words, the strace command displays all the cleartext data coming from or going to a user logged via ssh. The file descriptor numbers are operating-system dependent; the example is specific to sshd on Linux.
The wiretapping example (see Figure 3) is a reminder that encryption does not solve all security problems. In particular, encrypted connections protect only the connection itself, not the data at the endpoints of the connection. This is the main reason why today's "secure web servers" aren't necessarily secure: They just protect sensitive information on its way across the Internet. Once the data is on a server, it is at the mercy of bugs in software or in human procedures.
The Reverse Turing Test
I'd like to end this article with some food for thought. A while back, I mentioned that a compromised machine cannot be trusted, and that all information coming from a compromised machine is suspect. Once a machine is under control by an intruder, the machine could lie about almost anything.
Would it be possible for an intruder to program a machine so that it would be impossible to find out that the machine is "owned," without actually taking the machine apart? Changes to application program and data files are easy to detect if you know what those files are supposed to look like. Changes to running processes are already more difficult to detect. What about changes at the OS kernel level, or even changes at levels below the OS kernel?