Dr. Dobb's has recently examined many aspects of development for mobile computing, especially iOS devices. This article discusses what happens below the code — how the various cores are used by the apps and OS and how power savings can be improved — especially on multicore mobile devices — with embedded virtualization.
Energy conservation is increasingly becoming a core requirement in the design of computer systems. On the desktop and on servers, the main driver for conservation is the cost of powering and cooling computer systems, and the environmental impact of ubiquitous PCs and massive data centers. For the fast growing intelligent and mobile embedded devices sector, green requirements are driven by the need to extend battery life and enhance product performance.
Both multicore silicon and virtualization technology are enjoying increasing deployment in intelligent devices of all kinds. As in the enterprise and on the desktop, embedded virtualization meets the needs of a range of use cases: hardware consolidation, legacy application support and migration, IP isolation, trusted computing, and notably, energy management (EM).
This article examines current assumptions about multicore, power management, and virtualization, especially in mobile devices. It highlights how the mission of virtualization becomes even more critical with multicore systems, expanding to include cross-core energy management. In particular, it compares power management mechanisms in embedded devices, and demonstrates how multicore processors require rethinking of power management to conserve energy.
Power Management Mechanisms
Energy conservation focuses both on building greener, leaner circuits from scratch and from using existing systems and silicon more efficiently. On the hardware side, conservation can entail using lower supply voltages and reducing leakage current, reducing power consumption. Under software control, power management technology has traditionally focused on mechanisms to support dynamic voltage and frequency scaling (together, DVFS) to meet changing load and use policies. Operating voltage and clock frequency, while independent concepts, are usually tied together by the realities of circuit design.
Voltage scaling involves lowering (and raising) processor core supply voltage (Vcc) from its nominal value (e.g., 1.3 VDC) downwards towards a minimum value. Since Power = Voltage x Current, voltage scaling, in theory, saves power and energy over time by reducing total power consumption.
However, operating at reduced voltage, processor circuitry inherently either runs slower or consumes more current at the same speed. Typically, scaling down core voltage predicates scaling CPU clock frequency, resulting in decreased overall processor performance.
Downward scaling of CPU clock frequency reduces power consumption, as switching the circuits between logic levels requires less energy to charge and discharge circuit capacitance more slowly.
Examining the two together, we derive dynamic power (Pdyn), which varies directly with frequency (f), and with the square of voltage: Pdyn α f V2.
Due to the linear relationship between dynamic power and frequency, scaling frequency alone, while reducing power, does not actually reduce the dynamic energy use: A given number of CPU cycles still requires the same amount of dynamic power (just consumed over a longer time period). In fact, running at a lower frequency may result in increased total energy usage, because of constant static power consumption (from leakage current); static power is independent of frequency, and therefore static energy is proportional to time. At lower frequency, execution time increases and so does static consumption.
Impact of RAM
Power consumed by RAM is independent of CPU core voltage. It also comprises a static component, and a dynamic component roughly proportional to the number of memory accesses (loads and stores) by the CPU, which depend on the CPU core clock frequency. However, memory accesses are slower than CPU operations, and the CPU frequently stalls waiting for data from memory. When the CPU runs slower, the number of stall cycles (which result in waste of dynamic CPU power) is reduced.
Thus, the relationship between power consumption and core frequency is a complex function of hardware characteristics and program behavior. While power use by CPU-bound programs (which rarely access memory) tends to be minimized at high clock rates, for memory-bound programs, minimal power consumption occurs at low frequencies.
DVFS is today a standard feature of most microprocessor families. However, due to the complexities just described, DVFS has yielded mixed results from software systems that attempt to use it. The OS kernel policies that implement any decision to adjust DVFS state need to consider:
- Relative CPU and memory power consumption.
- Importance of static vs. dynamic power use by CPU, memory, and other components.
- Degree of memory-boundedness of applications.
- Complex trade-offs between DVFS operating points and sleep modes.
- Dependencies and interactions among CPU cores, buses, and other subsystems.
Multiple processors on a single piece of silicon were once a high-end capability implemented only on high-end server and desktop processors. Today they are fast becoming mainstream, with multicore enjoying adoption across the embedded computing landscape. Silicon suppliers routinely integrate specialized companion processors alongside general purpose applications CPUs on a single substrate (asymmetric multiprocessing or AMP), and are also deploying 2x, 4x and other parallel configurations of the same ARM, MIPS, Power, or x86 architecture cores (symmetric multiprocessing or SMP). Driving this evolution are requirements for:
- Dedicated silicon to process multimedia, graphics, baseband, etc.
- Sustained growth of compute capability without power-hungry high frequency clocks.
- Running multiple OSes on a single device (e.g., Android plus a baseband OS).
Multicore systems present steep challenges to energy conservation paradigms optimized for single chip systems. In particular, multicore limits the scope and capability of DVFS:
- Most (System on Chip) SoC subsystems share clocks and power supplies.
- Changing the operating voltage (Vcc) of one of several SoC subsystems (when even possible) can limit its ability to use local buses to communicate with other subsystems, and to access shared memory (including its own DRAM).
- Clock frequency scaling of a single SoC subsystem also presents interoperability challenges, especially for synchronous buses.
- Multicore systems usually share Vcc, clock, cache, and other resources, requiring DVFS to apply to all constituent cores and not to a useful subset.
Silicon supplier roadmaps point to further multiplying numbers of cores — today, 2x on embedded CPUs; and soon 4x, 8x, and beyond. This surfeit of available silicon will encourage designers to dedicate one or more cores to particular subsystems or functionalities (CPU-function affinity). Some dedicated operations, like media processing, will use cores in a binary fashion — at full throttle or not at all. However, most other functions will impose varying loads, ranging from a share of a single core to saturating multiple cores. All-or-nothing use is fairly easy to manage, but dynamic loads on multiple cores present much greater power management challenges.
Power Management Across Operating Systems
Operating systems do not excel as resource managers. If they were more capable in this area, virtualization would not enjoy its current popularity. Embedded operating systems aren't any better at managing resources than their server counterparts: they assume static provisioning and full resource availability, with fairly simplistic state models for resources under their purview.
Many intelligent devices deploy multiple operating systems: high-level operating systems like Android, Linux, Symbian, WindowsCE or WindowsMobile to provide user services and to run end-user applications, and one or more RTOSes to handle low-level chores like wireless baseband and signal processing. These OS's and the programs they host may run on dedicated silicon, may occupy dedicated cores on a multicore system, or they can also run in dedicated partitions of memory and cycles of a single shared CPU.
High-level applications OS's typically include their own power management schemes that leverage DFVS (e.g., Linux APM and DPM, and Windows/BIOS ACPI). Most RTOS's eschew any operations that curtail real-time responsiveness, leaving OEMs and integrators to roll their own or do without.
Whatever the inherent energy conservation capability of resident OSes, there remains the challenge of coordinating efforts in a multi-OS environment. Even if one of several deployed OSes is capable of managing power in its own domain, it will be unaware of the capabilities and state of its peer OSes elsewhere in the system, adding to development and integration headaches. Even if all co-resident applications OSes and RTOSes have some power management capacity, how can systems developers and integrators coordinate and optimize operation and energy conservation policy across OS domains?