Len manages the Software Test and Quality Assurance Department for GTE Internetworking (formerly BBN). He can be contacted at [email protected]
UNIX daemons are programs that run in the background, letting you do other work in the foreground. Because daemons (supposedly short for the "Disk And Execution MONitor" program; see http://www.dict.org/) are independent of control from a terminal, they can either be started by processes, or by users without tying up a terminal as the daemon runs.
Interestingly, the noncomputer definition of "daemon" (an evil spirit or inner/private voice) applies to UNIX daemon programs. UNIX daemons have some characteristics in common with mythological demons, in that daemon programs skulk around unseen in the background just as a demon would. And daemons act like inner voices in that they can run continuously and, like a conscience, always be accessed.
You need to care about UNIX daemons because:
- The ability to run a program as a daemon means you can have multiple programs running at the same time, without requiring the exclusive use of many, or even one, terminal or workstation. This may not sound like such a big deal in the age of GUIs; after all, you can always just open a new window, start your (nondaemon) target program, and minimize your window. Remember, however, that each of those minimized windows will be unnecessarily tying up system resources (not to mention cluttering up your desktop).
- With the correct startup (and restart) scripts, daemon programs can run indefinitely. This means that other programs can always depend on the presence of the daemons, and can therefore make use of functions performed by the daemons instead of having to reimplement these functions over and over. For example, if you're trying to open a terminal session on a remote system, you don't have to write your own protocol; you can access the telnet daemon on the remote system. Another example is the syslogd daemon. If you want your daemon to write logging messages to the system log, all you have to do is to call the syslog function. The ability to write to the system log is more important than it may sound, as you have to remember that since your daemon will not have a controlling terminal, it will not be able to simply print output messages to stderr. This is true unless you don't mind having your daemon report all its status and error messages to either the system console or whatever terminal was used to start the daemon. Either way, the messages will just scroll off the screen.
- If you're going to develop or test software in the UNIX world, you're going to encounter daemons. They're there. They're a major reason for UNIX's high level of usability and reliability as an operating system. And you have to be able to test them.
With all this in mind, in this article, I'll present a recipe for testing UNIX daemons.
Examples of Daemons
I'll start by examining Figure 1, which is sample output from ps (for more information on ps, see Advanced Programming in the UNIX Environment, by the late W. Richard Stevens, Addison-Wesley, 1992). The command-line options used specify that you want to see the status of all processes, including those owned by other users, that you want to see both processes that do and do not have controlling terminals, and that you want to see job control information. (The ps command syntax under UNIX systems based on BSD is "ps -axj." To receive the same output on UNIX systems based on SVR, the command is "ps -efjc." You can find a concise history of the development of the BSD and SRV variants in The Design of the UNIX Operating System, by Maurice J. Bach, Prentice Hall, 1990.)
To understand what Figure 1 shows you, I've added line numbers to make things easier to follow:
Line 1. The column titles indicate information in Table 1.
Line 2. The swapper process (often referred to as the "scheduler") allocates the CPU and memory to processes. Process 0 is created during the boot process, it then forks a child (process 1) and becomes the swapper process.
Line 3. The init process (process 1) initializes all other processes. The other processes either have init, or a child of init as a parent.
Line 4. The pagedaemon process controls the paging of virtual memory. Swapper, init, and pagedaemon are kernel processes (their code is part of the kernel).
Line 5. The syslogd daemon is used to report system messages.
Line 6. The cron (clock daemon) process executes other processes at specified times and dates.
Line 7. These days, a very common use of daemons is for programs to support and control networked communications between computer systems. In the past, separate daemons would be run for telnet, FTP, and other server processes. The 4.3 BSD release of UNIX introduced the superserver inetd daemon. Briefly, this server program handles incoming requests for multiple servers.
Lines 8-12. These telnet processes were created by inetd.
Line 13. The last line in Figure 1 is actually me, editing the temp file that I used to collect the ps output. The parent PID (16059) is that of my shell process as seen in the following ps output; see Figure 2. Also, the emacs job (PID 19690), unlike the daemon processes, does have a controlling terminal. It's the same terminal as the shell.
The cheapest bug is the one that never gets coded. Accordingly, the first class of tests takes the form of a thorough design review.
As a first step in any design review, you should look for solid reference material. In the case of UNIX daemons (or any other UNIX networking programs), the best place to look is the series of UNIX network programming books by W. Richard Stevens. (In fact, much of this article relies on Chapter 126, "Daemon Processes and inetd Superserver," of Volume 1 of this series for technical reference information.)
What steps should the designer of a well-written UNIX daemon follow? You should:
- Create a new child process with fork and exit. Calling the fork function creates a new process. This new process is the child of the calling process. This child process has its own (unique) process ID. By exiting, the parent process dies, but the child process continues. If the daemon was invoked from a UNIX shell, the dying parent process will run the child process in the background. The fork function also ensures that the child process is not a session group leader. This is required before the setsid function can be called.
- Create a new session with setsid. Processes exist in process groups. Sessions are sets of process groups. The process that called setsid is made the session leader of the new session and the process group leader of a new process group, and it does not have a controlling terminal. If the process did have a controlling terminal, then calling setsid breaks the connection between the process and the controlling terminal.
- Ignore the SIGHUP signal and fork again. The second fork is needed to ensure that the daemon does not inadvertently obtain a controlling terminal when it opens a terminal device. If a session leader without a control terminal opens a terminal device, and that terminal is not already the controlling terminal of another session, the terminal becomes a controlling terminal for the session leader. The SIGHUP signal must be ignored: When the original session leader (in this case, the first child process) terminates, all processes in the session (in our case, the second child process) are sent the signal.
- Change the working directory. The working directory should then be changed to the root directory. This is done because if the daemon crashes and creates a core file, that file will be created in the current working directory. In addition, changing the working directory to the root directory avoids problems; for instance, if the daemon is started elsewhere in the file system, that file system cannot be unmounted unless the daemon is halted.
- Clear the file creation mask. The file creation mask should be reset to zero so that the inherited file creation mask does not affect any new files created by the daemon.
- Close any open file descriptors. The daemon will inherit open descriptors from the process that started it. How do you determine which descriptors are open? This can be difficult. (Stevens recommends that you simply close the first 64 descriptors.)
- Call the syslogd daemon for error and debugging information using openlog. Once this socket is created, you can write messages to the system log with the syslog function.
Following these steps will result in a daemon that is more reliable and more testable than simply running any old program in the background and calling it a daemon.
Once you've performed your design review, the next step is to verify the actual functions performed by the daemon, and any utilities built to support the daemon. The functions performed by a daemon will, of course, vary according to the daemon. There are, however, some constants you will want to look for in all daemons:
- Startup/shutdown scripts. Daemons are generally initiated when the target system is booted. To do this, daemon startup scripts are written and added to the system startup scripts. These scripts should be written to handle daemon startup failures. In addition, these scripts should also be written to cleanly shutdown the daemon programs. (The startup and shutdown options can be controlled through command-line options.) I encountered an interesting bug in a daemon shutdown script some years ago. The script was intended to enable system administrators to manage large numbers of web-server processes on a single system. To ensure that the administrators would always shutdown the correct server, the servers were assigned logical names. The process ID (PID) of each server was written to a text file. This text file would be accessed (then deleted) by the shutdown script. Simple, right? The only problem was that if the PID file for a server was missing or contained the wrong PID, the server couldn't be stopped. How did the PID files get deleted or changed? Well, when you're a system administrator dealing with dozens of servers and can't get server #28 to shutdown, sometimes you resort to deleting PID files. Oh yes, there was also a bug in the startup script that resulted in zero-length PID files. This bug, however, did not generate an error message, so you never knew there was a problem until you tried to shutdown a server.
- Information written to the syslog. One of my most vivid design-review memories involved a database server daemon. In reviewing the design specification, I didn't see any mention of the types or levels of logging information generated by the daemon. When I asked the chief architect about this, he said, "There's no need for that, the server just runs." I then asked what would happen if the server daemon should hang or crash? How would you figure out when it failed and what it was doing just before it failed? His answer was, "That won't be a problem. We'll just reboot the server." When I pressed the point, he became defensive. I decided to prove him wrong by placing the server under stress during its initial round of testing. I never had the chance because a few weeks later, an extended demo was performed for a group of in-house users. The server had to be rebooted about eight times before lunch because it kept hanging and there was no way for anyone to determine what was wrong. The design was quickly changed to generate logging information. The moral in this story? Use the syslog. Write status and debugging information to it. Design the server to include support for generating varying levels of debugging information. This design will enable you to prevent the server's performance being affected by always writing out debug information to the syslog.
Daemon programs are used to move data between processes or systems. In almost all cases, this involves moving large quantities of data. Accordingly, stress tests to measure and verify throughput and capacity should be a part of your daemon test suite.
The following are the sorts of characteristics you should look for:
- Response times/throughput tests. These tests include generating specific levels of traffic/usage, then measuring how well the daemon is able to process that traffic. I like to think of these tests as algebraic equations consisting of constants and variables. The approach I follow is to establish a baseline of performance, then change the value of one variable in the equation. For example, if the daemon is memory intensive, you can vary the amount of physical memory in the system. Luckily, UNIX includes built-in utilities (such as sar or vmstat) to measure system resources, so you can easily keep an eye on what the system is doing as you turn the dial.
- Dealing with backlogs. What happens to the daemon if its incoming flow of information is shut off for an extended period of time, a backlog is allowed to build up, and then the information flow is suddenly reopened? Can the daemon handle the flood of incoming data or requests? Does it drop requests? Or does it respond so slowly that it can never recover? Want an example? Suppose you have an Internet firewall that handles incoming and outgoing e-mail in a configuration where the mail is handled by dedicated mail servers. What will happen if one or both of the mail servers is halted on the same day that the firewall has to handle a huge amount of web traffic because the new Sports Illustrated swimsuit pictures are available on the Web? Will the firewall be so taxed handling the web surfers that the e-mail daemon cannot catch up?
- Dealing with success. This is a longevity test. The question to be answered is, "How long can the daemon run?" A daemon should be able to run for an indefinite period of time; but it will fail to do this if it includes bugs -- such as memory leaks where the daemon process does not free up allocated, but no longer used, memory.
Just like any other program, daemons must coexist with the rest of the system. This means that they cannot attempt to allocate communications ports that are already in use by other programs and they cannot monopolize system resources to the extent that no other program can be run.
Also, recall that I extolled the virtue of writing to the system log by sending data to the syslogd daemon. There's one more thing you have to do to ensure that you will be successful in doing this: You have to make sure that your daemon starts after the syslogd daemon in the boot cycle. No kidding. I once actually saw a problem where a daemon consistently failed to start, but it only wrote error messages to the system log if it was run manually from the shell. Any attempt to start the daemon by rebooting the system failed to generate even one error message, even though the command syntax used in the startup script was identical to that entered by hand. It took us quite a while to realize that the daemon was trying to write to the system log, but that the syslogd daemon was not (yet) running.
Once you've proven that the daemon functions correctly on its own and as a part of its greater system and that it can handle a traffic load, it's time to push the program to, and beyond, its limits through negative testing.
- Startup after a reboot. This means that daemons cannot attempt to allocate communications ports that are already in use by other programs and they cannot monopolize system resources to the extent that no other program can be run.
- Dealing with DOS in capacity tests. In this case, "DOS" refers to a denial-of-service attack. How much traffic can the daemon take? Stress tests should be performed with a constant level of use and with short-term spikes of heavy use. Capacity stress tests are also a form of denial-of-service security test. The idea here is to determine if the system in question will crash or otherwise be rendered unusable when subjected to a heavy load. In addition, the heavy load may not crash the system, but it may prevent any system monitoring utilities from reporting a break-in.
- The Big Bang theory. Assume that the function to be performed by the daemon is to receive information from multiple sources, then process that information. A good example of this type of daemon would be a component of a network management system. Let's also say that the daemon is required to support up to 100 remote devices, each of which is a source of information, and that each remote device is expected to generate 10 device-status messages per minute and any number of messages in the event of an error or emergency. A good test would be to first have 100 remote devices up and running, and therefore attempting to send messages to the daemon, and then start up the daemon. The goal of the test would be to ensure that the daemon could handle all those messages at once, immediately after a startup. The test could then be expanded to include more remote devices, and to take the form of these devices all reporting a flood of errors when the daemon is trying to start up.
You may not think that security is a concern for the daemon you're testing, but if you're dealing with a networked product, then you have to worry about security. Security must be built into your products, and the platforms on which they run.
The most important aspect in testing the security of an Internet product or service, however, isn't the mechanics of testing for security holes. It is in the development of an attitude about security. Some people are squeamish about Internet security. They react to news stories about Internet break-ins by becoming paralyzed with fear and wishing their problems would just go away. They won't. As the use of the Internet increases, especially the use of the Internet for e-commerce, the number of pirates on the Internet will also increase. Just like the Internet, itself, security problems are here to stay. You will have to deal with them (see my article "Software Testing in the Internet Age," Software QA Magazine, March 1997; http://www.softtest.org/). How do you do this? You cannot safeguard your products on a one-shot basis. It's an ongoing effort. You make security a daily part of your development and test environment. The security you build into your products must evolve in reaction to new threats, just as they must evolve in response to new technological advances.
In terms of safeguarding a single daemon, you have to decide whether your daemon must run as root, or if it can run as another, less-privileged user. If it can run as a user other than root, then you must ensure that the user's account configuration restricts its access to only those files/directories that are necessary to run the daemon.
Where do you find out about Internet-related security alerts? The Internet, of course. In addition to there being bad people on the Internet, there are some good people out there, too. In 1988, a computer virus named "worm" caused major network outages. In response to this incident, an emergency response team was formed by the Defense Advanced Research Projects Agency (DARPA). From this team grew the Computer Emergency Response Team (CERT), now called the CERT Coordination Center. The CERT Coordination Center issues advisories on potential Internet security threats. For example, if a security hole is found in a specific program (such as the UNIX sendmail program), they investigate, document the risks, and make recommendations as to how the risks can be avoided. You can subscribe to advisory e-mail lists so that you'll automatically be kept informed as new holes are found and filled. You must also incorporate the latest security tools into your testing. You should always test your systems for potential security holes by using security scanner programs such as SATAN (a shareware program created by Wietse Venema and Dan Farmer) or the Internet Security Scanner (created by Christopher Klaus) and by staying up to date with the latest security information.
When you set out to plan and execute tests for a UNIX daemon program, you have to plan to test:
- The functions performed by the program, the design of the program, how the program fits into the overall product, and so on. In other words, the testable characteristics (including security) that you have to consider for any program.
- The fact that the program is a daemon. As a daemon, the program's design must follow certain rules to ensure that the feature that distinguishes it as a daemon, this being the ability to run detached from a controlling terminal, will function properly.
To verify the functions performed by the program, you can use the program's functional and design specifications as reference material. In order to verify whether the program can really serve as a daemon, you have to do some research into just how UNIX handles processes and terminals. In short, there's nothing magical or mystical about UNIX daemons. That is, once you understand how they work.