VTune1 from Intel is a proprietary system-level profiler and performance analysis tool for Intel architecture. It introduces minimal overhead and therefore can be perceived as relatively unobtrusive. VTune works by collecting data via a kernel module from various CPU counters. This information is collected when an interrupt is generated. The granularity of the data can run from a process level down to an instruction level and is accessible through a highly-usable and configurable GUI.
VTune, when fully configured for your application and operating system, can identify performance issues at several levels of granularity from system-level to microarchitecture-level. As a tool for developers, it is extremely valuable since it has a global view at all granularities. OS performance counters can also be monitored and correlated to instruction-level hotspots. By using this correlation, we can answer questions such as "When the memory use in our system begins to ramp, what happens to our applications CPU usage?" If the source code in your test application is hooked into the VTune application, we can also drill down from the application level into threads and down to code functions.
It is impossible to outline all the features of VTune and indeed many of these tools described in this paper, however, the interested reader is directed to the references.
Intel Thread Checker
The Intel Thread Checker is a plug-in for the VTune debugging environment. It can be used to locate hard to find threading errors such as race conditions and deadlocks.
The system activity reporter (sar) is a lightweight open source tool licensed under GPL that is used for collecting system-wide performance measures. The tool is generally installed by default on Linux, however, sometimes it may need to be installed using the sysstats package. Like top and ps, sar collects data from operating system counters via the proc file system. It provides performance data at system-level granularity reporting on a wide variety of metrics such as CPU usage, disk IO, memory, network IO, and IRQ. The tool can update these values at intervals of a minimum of 1 second.
sar can only provide information at system-level granularity and is used only to provide snapshots and overviews of overall system performance. Spurious or unexpected measurements from sar can be a first indication of performance issues of the system as a whole or of a single process or group of processes. It can be configured to run in the background, constantly providing a readily accessible database of system performance at any second during the day.
Linux Trace Toolkit (LTT) consists of a kernel patch and tool chain that gives the user the ability to trace events on the system. These events can be system kernel events (such as context switches, or system calls, and so on) or any application-level event. It is GPL licensed and has minimum impact to the run-time performance of traced applications. It can be used to isolate performance problems on parallel and real-time systems and analyze application timing. Any code that the user would like to be analyzed needs to be recompiled to be instrumented by LTT.
Alternatively, LTTng (Next Generation) is also available, which adds features such as a GUI Trace Viewer. See Figure 9.
The iostat command is used for monitoring system input/output block device loading. With multiple block devices in the system, it can be useful to determine which device(s) is currently the bottleneck. iostat provides a per device view of the number of transfers per second on each device as well as read and write rates. See Figure 10, for an example of the "extended iostat device" only output during a large file copy. Note: Observe the temporary increase in device activity while the file was being copied.
iotop is a Python program with a top-like user interface that can be used to associate processes with I/O. It requires Python version 2.5 or greater and a Linux kernel version 2.6.20 or later with the TASK_DELAY_ACCT and TASK_IO_ACCOUNTING options enabled. Therefore, a potential recompilation of the kernel may be required if these options have not been enabled by default. iotop is licensed under GPL. iotop provides data regarding the amount of Disk IO occurring within the system on a per process basis. This lets users determine which applications are using the disk(s) the most.