William Nagel is the Chief Software Engineer for Stage Logic LLC, a small software development company. He specializes in developing CORBA-based real-time systems. He can be contacted at firstname.lastname@example.org.
Nowadays it seems like Linux is everywhere. Okay, maybe it hasn't quite caught on as a major player in the desktop market yet, but the chances are good that your new wireless router or DVR has Linux under the hood. In fact, in the low-to-midrange embedded market, Linux is vying to be kingand you can't get very far in the embedded world without running into real-time applications.
Historically, real-time constraints have meant investing in expensive real-time operating systems, and if you're building an antilock break system or interplanetary probe with hard (often submillisecond) timing constraints, then that's still pretty much the case. On the other hand, if you are looking for a soft real-time system where the timing needs to be very good but an occasional small blip isn't catastrophic, then Linux may be just what you're looking for.
Soft Real-Time versus Hard Real-Time
In the world of real-time programming, there are two classes of "real time." The first is hard real-time. Hard real-time programs have absolute timing requirements. If things don't happen within specified time constraints, then the system can suffer catastrophic failure. For example, the launch controller on a rocket would usually be a hard real-time system. If the booster rockets don't fire within a specific window of time (which may be less than a few microseconds), the rocket could lose control and crash. If you need this kind of timing, then you need one of the (mostly commercial) hard real-time extensions to Linux or another commercial real-time OS.
Soft real-time systems, on the other hand, have much less stringent timing requirements. They need to meet their timing requirements most of the time, but the occasional miss or delay won't break the system. An example of this would be a music player. If audio fragments aren't played at the correct time, the music may not be very pleasant to listen to, but the player shouldn't crash. As you can see, though, even in this type of "soft" real-time system, it is desirable to have real-time performance that is as good as possibleunless you're trying for some kind of techno effect, choppy music is probably not what you want to listen to. If this is what you need, Linux is a great choice right out of the box (or tarball as the case may be).
A Step Up with 2.6
The 2.6 Linux kernel was a big step up for Linux in terms of real-time capabilities, thanks to the contributions of Ingo Molnar, Robert Love, and many others. Prior to 2.6, Linux had some real-time-oriented features, but it was still hard to meet demanding predictability requirements. That has all changed with 2.6, though.
One of the biggest real-time-related improvements to the 2.6 kernel is kernel preemption. Prior to 2.6, the kernel scheduler was able to interrupt threads running in user space, but as soon as a thread made a system call that put its execution into kernel space, there was no way for it to be interrupted. That meant that a high-priority process that was ready to run could easily be blocked for long periods of time by a low-priority process inside a slow system call. With a preemptable kernel, that is no longer the case. Now kernel code can be interrupted just as easily as userland code if something of higher priority is made runnable.
Another improvement in the 2.6 kernel that has made it more suitable to real-time systems is the constant-time scheduler. In Version 2.4 and before, the time that it took for the kernel scheduler to figure out which process to run next was proportional to the number of threads currently running. Lots of threads, lots of time. With the constant-time scheduler, though, the amount of time needed is constant, even with huge numbers of threads. This helps to greatly improve predictability and makes it more practical to run massively multithreaded programs that are common in real-time applications.
Together, with a variety of other latency improvements in the kernel, these two key features have really made Linux ready for serious soft real-time work. Now the trick is just knowing how to make sure that your real-time application actually takes advantage of the Linux features and runs in real time. Fortunately, you're in luck. As it turns out, I haven't quite reached my word-count limit for this article yet, so you get treated to several more pages dedicated to answering just that question. None of these things are actually new features in the 2.6 kernel. They were all there in the 2.4 kernel at least, but until 2.6, the kernel hadn't really caught up to the point where they performed well enough for serious work.
Before moving on, though, I'd like to take a moment to talk about permissions. For good reason, Linux restricts real-time features to processes with root privileges because a high-priority real-time thread can starve out all other processes on a machine. Therefore, if you want to use most of the following features in your program, it will need to be run as root, or at least granted the appropriate privileges by another root process.
Getting Your Priorities Straight
If no special steps are taken, processes will be scheduled with the default scheduler, called SCHED_NORMAL (or SCHED_OTHER in older kernels). Although the kernel can be given priority hints for these processes (using nice), SCHED_NORMAL processes have no real priority guarantees. However, Linux also provides two additional schedulers that can be used for scheduling real-time threads, SCHED_FIFO and SCHED_RR. These processes run at priority levels 0 through 99 and always preempt any SCHED_NORMAL process when they are runnable (a higher priority real-time thread will also always preempt a lower priority real-time thread).
The first, SCHED_FIFO, is a first-in/first-out scheduler. The highest priority SCHED_FIFO thread that is available first will always be run by the scheduler. Once it starts running, it will only be preempted if a higher priority real-time thread becomes runnable, or if it yields the processor (either by blocking or by calling sched_yield()).
The second real-time scheduler that Linux provides is the SCHED_RR round-robin scheduler. SCHED_RR behaves like SCHED_FIFO in that it always runs the first available real-time process of the highest priority (and always preempts any SCHED_NORMAL processes). Unlike SCHED_FIFO, though, SCHED_RR doesn't allow a thread unlimited time to run if it isn't preempted by a higher priority process. Instead, if there are multiple processes of the same priority that are runnable, the SCHED_RR scheduler will only allow each process to run for a specific time quantum before it preempts the thread and gives the processor to the next available thread at that priority level.
To set a process to use one of the real-time schedulers, you need to use the system call sched_setscheduler(pid_t pid, int policy, const struct sched_param *p). The pid argument tells it which thread or process to set the scheduler for, and the policy tells it which scheduler to use (SCHED_NORMAL, SCHED_FIFO, and SCHED_RR are defined). You can also set the priority that should be used by passing a pointer to a sched_param. The sched_param has a value named sched_priority that you can set to the priority you'd like. Also, if you are already in the thread whose priority you want to set, you can pass zero as the thread ID.
Once a thread is running under a real-time scheduler, you can change its priority with a call to sched_setparam(). This time, you just need to send the thread ID and the sched_param structure. Or, if you want to get the current priority of a thread, you can call the companion function sched_getparam(), which takes the same arguments and fills the sched_param structure it's passed with the priority. If you take a look at Listing 1, you can see how these two functions can be combined to elevate the current thread to that of a child thread; in this case, to prevent a priority inversion at a shared resource.
I saw you getting up to go try those functions out in your latest multithreaded app. As it turns out, there's a better way to deal with setting the scheduler and priority in a threaded programby using wrapper calls in the pthread library. For instance, if you want to set the scheduler, you can invoke pthread_setschedparam(), which takes the same arguments as sched_setscheduler() except you pass a pthread_t (you know, the one pthread_create() gave you) instead of the pid_t. That's not all, though. As it turns out, there's an even better way to set the scheduler and priority on a thread if you haven't created it yet. The pthread_attr_setschedpolicy() and pthread_setschedparam() functions allow you to set the desired values on a pthread_attr_t structure, which you can then pass to pthread_create(), as you can see in Listing 2, which creates a new thread with SCHED_RR and priority 50.
It's All in the Timing
Of the several methods of setting a timer in Linux, the easiest to use (at least of the timers that you're likely to care to use in any sort of real-time app) is the system call nanosleep(). With nanosleep(), you can put your application to sleep for a specified amount of time, and the kernel will wake it up at some point not before (and usually very soon after) the time you give. Don't let the name fool you, though. Although nanosleep() takes a timespec struct that allows you to give an accuracy with nanosecond precision, nanosleep doesn't really give anywhere near that level of accuracy. Under Linux, the kernel only checks on timers once every "jiffie," which defaults to 1 ms under the 2.6 kernel (on older kernels it was 10 ms, and 10 ms is still a compile option in the 2.6 kernel). Once upon a time, nanosleep would busy-wait if the sleep time given was less than 2 ms, but not anymore. On the other hand, don't be too discouragedas long as you don't need submillisecond accuracy, nanosleep() is very good, as you can see in Table 1. (test.c, the code that generated the results in Table 1, can be found online at http:// www.cuj.com/code/.) There is an overhead of about 2 ms for each call, but it's consistent enough that you can compensate for it if you need to (usually by undersleeping for 2 ms, possibly with a busy-wait to pad the small amount of extra time if you need to ensure that you don't finish early).
One of the big advantages of nanosleep() over its older counterparts sleep() and usleep() (other than a cooler name) is the ability to restart the sleep after an interruption. If a sleeping process receives a signal during a sleep, nanosleep() will be interrupted and will return -1 (with an errno of EINTR). This is also where nanosleep()'s second parameter comes into play. After an interruption, nanosleep will fill the timespec pointed to by the second argument with any remaining time. This allows you to write a single line that will guarantee that your nanosleep does indeed sleep for at least the requested time, even if a signal crashes the party. For instance, if you have a timespec t with the time of your sleep, the following will work:
If you need more accurate timing, you may want to look at /dev/rtc. With /dev/rtc, you can set a tick rate from 2 Hz up to 8192 Hz (in power of 2 increments). Each time the clock ticks, an interrupt will be generated. In your program, you can watch for these interrupts by doing a read() (which will block until the tick is hit). For instance, Listing 3 shows how you could set up a timer to show video frames at 32 Hz.
Using the real-time clock will get you much better accuracy than nanosleep(), but it does have a few drawbacks. The most obvious is its limited timing options. For example, if you needed to run at a rate of 5000 Hz (200 s), you'd have to set the clock to 8192 Hz (122 s) and then busy-wait until the remaining 78 microseconds had passed. Still better than burning up 100 percent of your CPU, but far from an ideal method. Additionally another (possibly more serious) limitation of /dev/rtc is the fact that it can only be opened by a single process at any given point in time.
Real-time software is all about predictability, and nothing kills predictability quite like the kernel deciding it's a good time to free up some physical memory by swapping part of your program out to the swap disk. Suddenly, things that normally take microseconds to finish end up waiting for tens or even hundreds of milliseconds as the program's memory is swapped back into physical RAM. Needless to say, that's going to be an unacceptable delay for anything resembling a real-time system. Even if it's a soft real-time system, that's pretty bad. We can deal with soft, but down-comforter soft is hardly conducive to a working piece of real-time software. For instance, a movie player program can handle the occasional timing glitch, but regular half-second delays are going to make the program pretty unusable. Of course, an active movie player is unlikely to get swapped out to the hard drive. What if it's a security camera viewer that starts actively playing when it receives a signal that movement has been detected. If there isn't much movement under normal circumstances, the program may very well have to swap back in when it receives the signal, leaving possibly crucial footage choppy or delayed.
Obviously, the solution to this problem is to keep your real-time programs from being swapped out. One solution would be to just disable swap, then nothing could be swapped out. That's a pretty heavy-handed solution, though, and not at all acceptable for a single real-time application running on a general desktop system. Fortunately, Linux provides you with a way to lock certain memory HASH(0x80bccc) (or even an entire program) into physical memory, thus allowing everyone else to be happily swapped while your program sits quietly in blissful RAM.
To lock memory into physical RAM, you can use the system call mlock(), which takes an address and a size. It then locks the requested memory block into memory until the locking process ends, or until the program unlocks the memory with munlock(). You should be aware, though, that Linux can only lock memory in whole memory pages (the basic unit of memory for swapping purposes, which is 4 KB on x86). That means that if you ask to lock a block of memory that isn't an even multiple of the page size, the actual locked block will be padded to reach a page boundary.
Sometimes what you'd really like to do is lock an entire process into memory. Obviously it wouldn't be very practical to do that using mlock(), but fortunately, Linux provides a special system call just for that purpose. The mlockall(int flags) function will lock the entire address space of a program into memory. The mlockall() function takes a flag that tells it whether it should lock the current address space of the process or any pages that are added in the future (MCL_CURRENT or MCL_FUTURE). If you want to lock both the current space and future allocations, you have to give both values (OR'd together).
You do need to be careful of using mlockall() on processes that lack the CAP_IPC_LOCK capability (which are all nonroot processes unless a root process has given them that capability). On Linux Versions 2.6.8 and earlier, CAP_IPC_LOCK was required to lock memory at all. Starting with 2.6.9, though, \ the RLIMIT_MEMLOCK resource limit was defined to be the limit of memory that nonprivileged processes could lock. However, that means that mlockall() will probably succeed for a nonprivileged process, but if MCL_FUTURE is defined, the program will get an out-of-memory error if it attempts to allocate memory beyond the lock limit (which may cause mysterious, hard-to-track crashes).
You should now have a solid feel for the landscape of real-time programming with the Linux kernel. Of course, this isn't all you'll need to know. For instance, I didn't even begin to talk about mutexes or other synchronization methods, nor did I talk about thread management. Both are critical topics if you're going to do any serious real-time programming, but neither require directly dealing with kernel system calls and features. Instead, both are dealt with through the pthreads library, which is at least another article in itself.