Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼


Better Multicore Energy Conservation on Mobile Devices with Virtualization

Managing Power in Mobile Handsets

Most of today's smartphones employ separate dedicated CPUs for applications processing, multimedia and graphics, and for real-time wireless baseband modem operations. As multicore CPUs (and virtualization) become ubiquitous, designs are migrating these separate operations onto partitioned single and multicore processors. A good example is a mass-market smartphone such as the Motorola Evoke.

Lower-cost mass-market smartphones consolidate diverse functions onto a single processor with one or two cores. Next-generation devices will build on silicon with even greater numbers of available cores, and will surely find ways for each subsystem to consume available compute capacity.

[Click image to view at full size]
Figure 1: Full and Quiescent Loads Across Available Cores.

In these devices, one subsystem would be the baseband modem, whose real-time software stack handles a load that fully occupies one or more cores during peak processing (e.g., for streaming or voice conferencing), but typically consumes perhaps a fifth of a single core's capacity when quiescent. Another subsystem would be a HD multimedia stack, requiring an additional core (or more) at full load, and zero when no media is displayed. A GUI stack might use a half core during heavy user interaction, and zero when quiescent. User applications would consume any remaining computing capacity when executing; and they would occupy either zero cores when quiescent, or represent some other finite load with background processing.

DVFS Challenges

Clearly, each of the functional stacks and the operating systems that host them present unique challenges to managing power with DVFS. In combination on a multicore CPU, it becomes nearly impossible to determine useful DVFS operation points and policy for transition among them, both on a per function basis and a coordinated one.

The aforementioned scenario clearly illustrates that localized DVFS scheme are inadequate to address the needs of next-generation multi-stack multicore designs. Further analysis of the scenario also highlights the limitations of coarse-grained assignment of functions to available CPU cores:

  • Peak loads for different subsystems can consume most or all of one or more cores' compute capacity.
  • Gross assignment/dedication of functions to cores can waste available compute capacity and potentially starve functions at peak load.
  • Real-world total load is unpredictable due to third-party applications (e.g., with Android Market), and additional demands placed on communications and multimedia stacks from those applications and the traffic they generate.
  • Scalable loads dictate sharing of available CPU cores across functions.
  • Most silicon cannot run stably at all frequencies and voltages. Real-world energy management paradigms build on discrete stable pairings of voltage and frequency (operating points).

Energy Conservation and Virtualization

Rather than try to salvage legacy power management paradigms from each functional domain, let's employ the approach favored by data center IT managers — using virtualization for energy conservation.

DVFS wrings incremental gains in energy efficiency by reducing voltage and clock frequency. A given CPU offers developers and integrators a set of safe "operating points" with fixed voltages and frequencies. With varying load, EM middleware or EM-aware OSes transition from operating point to operating point (Figure 2).

[Click image to view at full size]
Figure 2: DVFS Operating Points between CPU Stop and Full Throttle for an ARM Cortex A8 CPU.

A logical extension of applying DVFS is reduction of voltage to 0 VDC and completely stopping the CPU clock. That is, utilizing only two operating points — Full Stop and Full Throttle — but employing them across the range of available cores: OP1 uses one core, OP2 uses two, etc.

In multicore systems without virtualization, or with simple partitioning, wholesale shutdown of CPU cores presents nearly insurmountable challenges, because loads (OSes and applications threads) are tightly bound to one or more cores (complete CPU affinity, as in Figure 3):

  • Shutting down a CPU core requires a policy decision between a shallow sleep mode (with fast entry and exit, but significant remaining leakage power) and a deep sleep mode (with low leakage power but high overhead at both entry and exit).
  • Migrating loads across CPUs is nearly impossible; only loads already running as SMP can shed CPUs, but not migrate across them.

[Click image to view at full size]
Figure 3: Complete affinity among functional subsystems and CPUs.

Introducing virtualization neatly addresses the challenges of CPU shutdown and CPU core affinity. First, instead of binding loads to actual CPUs, the presence of a full-featured Type I Hypervisor associates subsystems with dedicated virtual CPUs. Based on real compute needs and on policy established at integration time, the hypervisor can bind virtual CPUs to one or more physical CPUs (Figure 4) and/or can share available physical CPUs among virtual CPUs as needed (as suggested in Figure 1).

[Click image to view at full size]
Figure 4: Affinity of loads with virtual, not physical CPUs.

To facilitate energy conservation, a hypervisor enables full stop of underutilized CPU cores by (re)mapping virtual CPUs (and their loads) onto fewer physical CPUs (Figure 5).

[Click image to view at full size]
Figure 5: Shutting down underutilized CPUs and consolidating loads onto remaining core(s).

This neat trick is only possible via the construct of virtual CPUs, which facilitate arbitrary mapping of loads to physical silicon and migrating running loads transparently across CPU cores. The resulting consolidation means that, on average, more CPUs are in an off state and they remain there for longer, saving substantial energy and over time.

Quiescing whole cores leads to linear (and therefore highly predictable) performance-energy tradeoffs, unlike DVFS, and is therefore easier to manage. Moreover, DVFS can still be employed on the active cores for fine-tuning energy-performance tradeoffs. Since power management is now handled by the hypervisor, with full knowledge of performance requirements, hardware-imposed constraints such as common core voltage are readily incorporated.

Virtualization: Ideal for Global Energy Conservation

As mentioned earlier, embedded operating systems are notoriously poor at resource management. Those with native power management schemes have "been taught" to monitor their own loads and make energy policy transitions. They are not, however, equipped to manage power and CPU utilization outside their own limited domain, on other CPU cores running different operating systems and loads. For multiple, diverse hosted OSes in a multicore system, effective power management must "step outside" of the local context of functional subsystems (baseband, GUI, etc.) to a scope that encompasses all subsystems together.

This article has offered a brief review of power management mechanisms and energy conservation, and challenges presented by modern multicore systems. Of currently available software-based energy conservation mechanisms, only virtualization is positioned (in the global architecture/stack) to manage power for all cores and all functional subsystems in concert. Because it is the hypervisor that actually dispatches threads to run on physical silicon, it is uniquely and ideally placed to comprehend real CPU loading (not calculated guest OS loads) and to scale power utilization by bringing available cores in and out of service.

— Rob McCammon works at OK Labs, where he owns the roadmap and lifecycle for mobile virtualization solutions and the OKL4 product line. Rob also heads up OK Labs' Virtualization Integration Practice.

Parallelized Algorithms Are Very Useful for Battery Powered Devices

Windows Phone 7 Series Multicore Programming Features

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.