It takes more than just source-level debuggers
Graham is the software technical manager for ARC International. He can be contacted at [email protected]
As embedded applications become increasingly complex, developers are relying more and more on real-time operating systems (RTOSs) to handle that complexity. However, the use of a multithreaded RTOS creates unique problems when debugging applications. Many of these multithreaded debugging issues are beyond the capabilities of standard source-level debuggers. Consequently, developers need additional debugger features and tools to address some of the common, yet critical, problems they face when debugging applications that use a multithreaded RTOS. In short, tools and debuggers that are RTOS-aware can greatly improve debugging efficiency.
Two main types of debugging issues arise when using an RTOS for embedded applications:
- Multiple threads of execution introduced into your application.
- RTOS code introduced into your software environment.
Multithreading is an operating system's ability to concurrently run programs that have been divided into subcomponents (or threads). When done correctly, multithreading offers better utilization of processors and other system resources, and provides a scalable, modular environment upon which to write application software. One of the problems created by multithreading is that, while debugging, you can typically see only the stack and local variables for the active thread. This makes it difficult to know what the other threads/tasks are doing at that time. Multithreading can also cause an array of new potential bugs to be introduced into your application. These include:
- Race conditions caused by timing problems between tasks.
- Corruption caused by sharing data between tasks, or multiple tasks executing the same code.
- Deadlock conditions occurring between tasks.
- Starvation of tasks.
Adding an RTOS to the embedded development environment introduces code and data structures into your application that you have not written and are not likely to understand. This means that the key piece of software you are using to interface to your embedded hardware becomes a sort of black box. If you don't understand the inner workings of the RTOS, it can be difficult to determine whether something is wrong when debugging the interaction between threads, memory-management problems, or other RTOS-dependent issues. Many royalty bearing, commercial RTOSs are not accompanied by source code. Life would be easier if you could access that source code, and some RTOSs are provided with source code for the benefit of their users, but reverse engineering it is not a good use of your time. What you really need are tools that help you use the RTOS and provide useful debugging information about the state of the RTOS and its various objects.
Standard Cross Debuggers
The most commonly used source-level cross debuggers offer a number of features that aid in debugging, such as breakpoints, local and global variable windows, memory windows, register windows, and stack windows. These debugging windowsespecially those that contain information related to thread state (or context), which include local variables, registers, and stackscan only be displayed for the active thread, which is the task currently executing on the CPU. There is often no way to determine the state of the other threads. To quickly debug applications written with an RTOS, you have two possible approaches: Either learn all of the inner workings of an RTOS (including all of its data structures, error conditions, and functionality) or use a set of tools that interprets this information for you and displays it in a format that is easily understood. Those tools should be integrated with your source-level cross debugger to make it RTOS-aware. They should include additional debugging windows that display features such as thread summary, task stack usage, semaphore and/or mutex status, message buffers, message queues, ready queues, and the RTOS memory pool. In addition, they should include capabilities such as context switching, for viewing the context of various tasks simultaneously; task-aware breakpoints; and RTOS trace and profiling.
Common RTOS Debugging Problems
Adding RTOS-specific features to a debugging environment can help a great deal when debugging common problems that arise from working in a multithreaded environment.
In multithreaded applications, corruptive memory accessalso known as "corruption"can occur when multiple tasks access the same memory at the same time. This problem can arise with device registers, shared memory buffers, global variables, and nonreentrant code. An example of a corruption problem is when Task0 reads a shared device and Task1 and Task2 write to the same device. Data from either Task1 or Task2 could be corrupted by the other task if a context switch occurs in the middle of a read or write operation; see Figure 1.
Debugging this problem using a standard debugger normally involves setting breakpoints at several locations to find out what each thread is doing, as well as estimating what the other tasks are doing at the same time. The problem with this approach is that it gives you no information about the actual state of any other tasks, or about what they are doing, without additional help. But with RTOS-aware debugging, you could set a breakpoint in one of the tasks while it is reading or writing to the device and then look at what the other tasks are doing at the same time using the task context-switching feature. In our example of corruptive memory access, you would see whether another task is also writing to or reading from the device simultaneously. Another way in which an RTOS debugging tool can help solve this problem is by allowing the debugger to examine the RTOS memory pool for signs of corruption. (For example, the MQX RTOS from ARCthe company I work forhas a small header for each block of memory allocated by the RTOS. If that header is overwritten, this can easily be detected by RTOS-aware debugging tools.)
Another common multithreading problem concerns semaphore deadlock. This occurs when a task is waiting for a semaphore that will never be freed because the task that owns the semaphore is also blocked.
For example, Task0 tries to lock Sem0 and then lock Sem1. Task1 tries to lock Sem1 and then lock Sem0. If, between Task0 locking Sem0 and Task0 locking Sem1, a context switch occurs to Task1, then Task0 has locked Sem0 but not Sem1. Task1 will run and lock Sem1, and then block when trying to lock Sem0. Task0 becomes the active task again and tries to lock Sem1, but Task1 has already locked it. The result is deadlock, in which Task0 is blocked while waiting for Sem1, which is owned by Task1; and Task1 is blocked waiting for Sem0, which is owned by Task0. Neither task can run to free up the semaphore that the other task is waiting for; see Figure 2.
Using a standard debugger, it can be difficult to discover the problem unless you happen to understand RTOS data structures. Instead, you would need to step through the code, line by line, to find the point at which both tasks block and do not return. But with an RTOS-aware debugging environment, you could stop the debugger any time after the application problem has occurred and bring up a window that provides the status of the tasks and the semaphores, as in Figure 3. You would then discover that each of the threads is blocked on a semaphore owned by the other task; see Figure 4.
Another common problem for software engineers working with RTOSs is stack overflow. In multithreaded applications, it is common for each task or thread to have its own stack. This creates a greater potential for underestimating the stack requirements of each task. Stack overflow is a common problem in multithreaded applications written with an RTOS. It can occur either because too many local variables have been allocated, or because the function call tree becomes too deep. The resulting problems can go undetected for some time, depending on which memory is being corrupted by the overflow. Often, the problem will appear in the execution of another thread that was not responsible for the overflow. For example, a task can create too many local variables or call a recursive function that causes the stack to overflow, resulting in a problem in another task that makes the application misbehave.
When debugging stack overflow problems, one issue that may be encountered is related to task stack memory usage. You often don't know where the beginning and end of each task stack resides in memory if these values have been assigned by the RTOS. Depending on the resulting undesirable application behavior, you may not even know where to begin looking for the problem because stack overflow problems can manifest in many different ways. You might start looking at local variables and global variables and step through your code until you reach the point of failure, only to then discover that a corruption has occurred. At that point, you will need to find the source of the corruption. Good luck. With RTOS-aware debugging tools, you could use a problem diagnostic tool from the debugger to discover the problem, or a debugger window that displays the memory usage of each thread's stack. This analysis usually requires the initialization of each thread's stack with a known value, which lets the debugger determine how much of the stack has been used; see Figures 5 and 6.
Memory leaks are among the most common problems that occur in embedded applications. Passing memory among multiple threads increases the likelihood of creating this type of bug. Eventually, the memory that has not been freed back to the RTOS exhausts the RTOS's memory systems and causes a failure in the application.
For example, Task0 allocates a message buffer from the RTOS and sends the message buffer to Task1. Task1 receives the message buffer but "forgets" to free the message buffer back to the RTOS. The RTOS eventually runs out of buffers to use for sending messages, and Task0 isn't able to allocate any more buffers.
When debugging these types of problems with standard debuggers, it is often difficult to determine the source of the problem without following the execution line by line, or examining the code by hand, in an attempt to match memory allocation code with the relevant memory freeing code. With RTOS-aware debugging tools, however, a memory pool or buffer pool window can be used to find out which threads own which memory blocks, and whether the memory buffers are all being used by one task and none of them have been freed. This information quickly leads you to the problem's source; see Figures 7, 8, and 9.
Thread (or task) starvation is another common bug. This occurs when a thread is prevented from making reasonable progress in its execution due to a lack of CPU cycles. This situation is often a result of the selection of thread priorities or incorrect thread synchronization programming.
For instance, Task0, Task1, and Task2 all need to run for the same amount of time. To ensure this, they use round-robin scheduling within the same priority level and time slice. However, Task2 is mistakenly programmed with a lower priority than Task0 and Task1. This prevents Task2 from running at all because Task0 and Task1 never block. While Task0 and Task1 context-switch back and forth at their priority level, Task2 never gets a chance to run.
Debugging a thread-starvation problem with a standard debugger can be difficult. You may not be able to determine that one of the application threads has not been running unless the lack of output from the task is great enough to be noticed. You can try to determine the problem's source by setting a breakpoint in the task that doesn't seem to be doing its job. However, if that task is running code common to other tasks, you may not be able to determine which task you have stopped in. Alternatively, if the thread is completely starved and never runs, the breakpoint will never be hit. Using more advanced debugging tools, such as an RTOS-aware profiling tool, you would be able to view a trace of the thread's CPU usage, in which you would see that the starved task does not get to run; see Figures 10 and 11.
Advanced Debugging Tools
So far, I've discussed a number of tools and features that can be added to the embedded development environment to enhance your ability to debug multithreaded applications. The most useful of these are extra debugging windows in standard cross debuggers that provide detailed information about the state of the RTOS and its various elements. Having this information readily available massively reduces the time it takes to debug complex bugs. In some cases, to find more elusive software problems, additional tools can be employed that provide profiling and performance analysis capabilities. These types of tools let you gather execution history and resource utilization information that can be used to analyze a large amount of information recorded by the RTOS, and display it in a user-friendly graphical format.
These tools are great in the lab, but what about debugging problems that arise in the field? To solve those problems, you must be able to access this same information from a live, remote target device that does not have a source-level debugger attached to a JTAG interface. What's needed is a remote monitoring tool that gives you access to the RTOS state information over an I/O connection, such as a TCP/IP or serial connection. The advantage of using TCP/IP is that developers can access this debugging information via the Internet and retrieve it from a product installed in the field. This saves time and money in travel costs when monitoring and debugging products installed in remote locations; see Figure 12.
Using an RTOS in embedded software applications can provide many benefits, including the ability to create powerful and complex multithreaded applications. However, their use may also create new and unique types of problems that must be debugged. Because of the RTOS's abstract and multithreaded nature, standard source-level cross-platform debuggers do not provide enough information to help you debug your applications quickly or easily. It takes far more than a standard debugger to work through even the most common software bugs experienced by today's embedded software developers. Developers using a commercial RTOS should, therefore, expect a lot more assistance from their suppliers, including RTOS-specific debugging tools that give easy access to RTOS status information from the source-level debugger, RTOS-specific profiling and performance analysis tools, and remote monitoring capabilities.