Channels ▼
RSS

Parallel

How Much Scalability Do You Have or Need?


O(K): Explicitly Threaded Code

O(K) means that the system typically has some constant number of things it can do to keep a constant number of cores busy at any given time. That number is hardwired into the structure of the program and will be the same regardless of the amount of hardware concurrency available at execution time.

For example, consider a first-person action game. To take some advantage of additional cores, let's say we divide the game's compute-bound work into three threads: Thread 1 does game physics, thread 2 does rendering, and thread 3 does nonplayer character AI. For simplicity, assume that all three threads are equally busy and interdependent. The game runs fine on a single-core machine; the operating system just interleaves the threads on the one core. When the user upgrades to a two-core machine, the game runs faster—but not twice as fast, because if we schedule thread 1 on core 1, and thread 2 on core 2, we have to put thread 3 somewhere. If we put thread 3 on the same core as thread 1, then thread 2 (which depends on 1 and/or 3) will be idle half the time. But if we schedule thread 3 on both core 1 and core 2, we incur cache sloshing and other overhead every time we move it from one core to the other. So the game runs faster on a two-core system, just not twice as fast. When the user upgrades to a four-core machine, the game runs faster still—but only three times as fast as on a single core, or maybe slightly better if spyware and other applications can be moved to the fourth core. When the user upgrades to an eight-core machine, nothing happens. When he upgrades to a 16-core machine, more nothing happens. The O(3) application is hardwired to prefer three cores regardless of the input or execution environment.

A closely related technique is pipelining, which is useful when we have a collection of things to operate on, but we can't apply O(N) techniques to fully parallelize them because of ordering dependencies. For example, consider a communications subsystem that packets data to be sent over the network. The input to the subsystem is a stream of raw packets; before sending a given packet, it must first be decorated with header information, then compressed, and then encrypted. Those three operations can't easily be run in parallel for a given packet, in part because they must be applied in that order. A solution is to have three agents that communicate using message queues: The Decorator agent (typically a thread) decorates each raw packet and sends it to the Compressor agent; the Compressor accepts the decorated packet, compresses it, and sends it to the Encryptor; and the Encryptor encrypts it and sends it to the network. Like the game example, this subsystem also has O(3) parallelism if the pipeline stages are equally busy. We are using Pillar 1 techniques (isolated agents, messaging; see [2] for details) because it's a natural fit for expressing pipelines: The pipeline stages are relatively isolated, and using messaging lets us tolerate any difference in execution rates between the stages.

Finally, consider Word again, but with a different usage scenario: Dictation. As I fire up speech recognition and dictate this paragraph, I see that generally two cores stay fairly busy as I'm speaking, while the rest of the system's cores remain idle. When used in dictation mode, this version of Word is an O(2) application that works fine on a single-core machine, better on a dual-core machine, but won't improve further in speed or accuracy in the presence of additional cores.

An O(K) application is explicitly structured to prefer K cores for a given input workload. O(K) code can't scale to take advantage of environments with more than K cores, and it can penalize execution environments with fewer than K cores by interfering with load balancing and causing some of the cores to be only partly utilized.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video