Mark Gray is a software development engineer working at Intel on Real-Time embedded systems for Telephony. Julien Carreno is a software architect and senior software developer at specializing in embedded Real-time applications on Linux.
With the advent of the Intel Atom processor and multicore processors, Intel architecture processors are proliferating in a number of new market segments, most notably embedded systems where good performance is essential. In parallel with this trend, Linux is becoming an established operating system option for embedded designs. The two trends combined pose an interesting problem statement: "How to get the most out of my embedded application running on an Intel platform and a general purpose operating system?" During all kinds of application development, there comes a time when a certain level of performance analysis and profiling is required, either to fix an issue or to improve on current performance. Whether it is memory usage and leaks, CPU usage, or optimal cache usage, analysis and profiling would be almost impossible without the right tool set. This article seeks to help developers understand the more common tools available and select the most appropriate tools for their specific performance analysis needs.
In Part 1 of this article, we summarize some of the performance tools available to Linux developers on Intel architecture. In Part 2 we cover a set of standard performance profiling and analysis goals and scenarios that demonstrate what tool or combination of tools to select for each scenario. In some scenarios, the depth of analysis is also a determining factor in selecting the tool required. With increasingly deeper levels of investigation, we need to change tools to get the increased level of detail and focus from them. This is similar to using a microscope with different magnification lenses. We start from the smallest magnification and gradually increase magnification as we focus on a specific area.
top and ps
The top and ps commands are freely available on all Linux distributions and are generally installed by default. The ps command provides an instantaneous snapshot of system activity on a per-thread basis, whereas the top command provides mostly the same information as ps updated at defined intervals, which can be as small as hundredths of a second. They are frequently overlooked as tools for understanding process performance at a system level. For example, most users tend to use the ps -ef command only to check which processes are currently executing. However, ps can also print useful information such as resident set size or number of page faults for a process. A thorough examination of the ps man pages reveals these options. Likewise, top can also display all this information in various formats while updating it in real-time. The top command window also displays summary information at the top of the window on a per-CPU basis.
In Figure 1, top is showing information for all threads of a process on a multicore machine. Using this more detailed view, we can see total activity on CPU, all threads of the process "app" and on which CPU each thread is scheduled at that instance (P). You can also see memory usage for the process including resident set size (RES) and total virtual memory use (VIRT).
In Figure 2, we can see similar information using ps. We can see the CPU usage on a per-thread basis with 1/10 % accuracy. This is the cumulative CPU percentage since the spawning of the thread.
As can be seen, top and ps provide a good general overview of system performance and the performance of each process running on the system.