From high-end servers to desktops and laptops, multicore processors are increasingly replacing single-core processors. At the same time, however, application developers are confronted with new challenges when creating software that efficiently uses these multicore processors.
The specific challenge I examine in this article is the situation in which multiple CPU-bound multithreaded applications create "oversubscription" on a server and/or desktop system. The problem occurs when individual applications create as many threads as the number of cores available, and when each of these threads performs CPU-intensive tasks. Consequently, oversubscription occurs, negatively impacting overall performance. In other words, without knowing the actual system load and other applications running on the system, each multithreaded application competes with each other for system resources.
Framework Details
The current generation of nonreal-time operating systems provide some type of fair resource (processor/core) allocation or scheduling; thus, the overall performance is impacted negatively when multiple multithreaded applications execute in parallel. Of course, there have been proposalsstatic mapping (done before applications are started) and dynamic mapping (calculated at runtime), for instanceabout how operating systems can provide some level of quality of service (QoS) and how application QoS values can be mapped to processor capacity.
The concept I propose here involves neither solving the oversubscription problem, nor providing a QoS service. Instead, my proposal involves a lightweight thread usage balancing framework that can be implemented without any changes to operating systems or runtime environments. (However, getting this functionality implemented in operating systems would make this mechanism available to all the applications on demand.) "Thread usage balancing" in this context refers to adjusting the number of threads each application uses, based on the utilization of the cores/processors. If not already using the maximum number of threads allowed (normally the number of physical cores available on the system), the applications only increase their thread count to the maximum allowed during their runtime.
As Figure 1 illustrates, there are two fundamental parts to the dynamic load-balancing framework:
- A stateless load governor running in the background, which monitors the utilization of the cores periodically. "Stateless" in this context means that the governor doesn't retain any information on how many threads each application uses.
- The library that applications use to communicate to the governor and adjust their thread count accordingly. The communication takes place via sockets over TCP/IP. All applications set the number of threads to use for parallel regions (in the OpenMP sense), based on the feedback received from the governor, rather than setting the number of threads as the number of processors physically available or as they wish.
Also, applications only increase their thread count, but can't set it more than the physical number of cores. This makes the initial thread count with which the application starts worth mentioning.