Channels ▼
RSS

C/C++

Eliminate False Sharing


False Sharing: What To Look For

Note that Example 1 shows only one common case where data can end up being close together in memory, namely the case of elements of contiguous arrays. But the same thing can occur when we have two independently used fields within the same popular object, or objects are close to each other on the heap, or other situations.

The general case to watch out for is when you have two objects or fields that are frequently accessed (either read or written) by different threads, at least one of the threads is doing writes, and the objects are so close in memory that they're on the same cache line because they are:

  • objects nearby in the same array, as in Example 1 above;
  • fields nearby in the same object, as in Example 4 of [3] where the head and tail pointers into the message queue had to be kept apart;
  • objects allocated close together in time (C++, Java) or by the same thread (C#, Java), as in Example 4 of [3] where the underlying list nodes had to be kept apart to eliminate contention when threads used adjacent or head/tail nodes;
  • static or global objects that the linker decided to lay out close together in memory;
  • objects that become close in memory dynamically, as when during compacting garbage collection two objects can become adjacent in memory because intervening objects became garbage and were collected; or
  • objects that for some other reason accidentally end up close together in memory.

What To Do

When two frequently-used objects are sources of false sharing because they're in the same far-too-popular cache line, there are two general ways to remove the false sharing.

First, we can reduce the number of writes to the cache line. For example, writer threads can write intermediate results to a scratch variable most of the time, then update the variable in the popular cache line only occasionally as needed. This is the approach we took in Example 2, where we changed the code to update a local variable frequently and write into the popular result array only once per worker to store its final count.

Second, we can separate the variables so that they aren't on the same cache line. Typically the easiest way to do this is to ensure an object has a cache line to itself that it doesn't share with any other data. To achieve that, you need to do two things:

  • Ensure that no other object can precede your data in the same cache line by aligning it o begin at the start of the cache line or adding sufficient padding bytes before the object.
  • Ensure that no other object can follow your data in the same cache line by adding sufficient padding bytes after the object to fill up the line.

Here's how you can write this as a reusable library component in C++:


// C++ (using C++0x alignment syntax)
template<typename T>
struct cache_line_storage {
   [[ align(CACHE_LINE_SIZE) ]] T data;
   char pad[ CACHE_LINE_SIZE > sizeof(T)
        ? CACHE_LINE_SIZE - sizeof(T)
        : 1 ];
};

To get an object of type MyType that is stored on its own cache line, we would write cache_line_storage<MyType>. Note that this code assumes you've defined CACHE_LINE_SIZE to a suitable value for your target processor, commonly a power of two from 16 to 512. It also uses the standardized C++0x alignment syntax; if you don't have that yet, you can use a compiler-specific extension like Gnu's __attribute__(( aligned(x) )) or Microsoft's __declspec( align(x) ).

If you're on .NET, you can write something similar but for value types only, which in their unboxed form are always laid out "inline" rather than as a separate heap object:


// C#: Note works for value types only
//
[StructLayout(LayoutKind.Explicit, Size=2*CACHE_LINE_SIZE)]
public struct CacheLineStorage<T>
   where T : struct
{
   [FieldOffset(CACHE_LINE_SIZE)] public T data;
}

It may seem strange that this code actually allocates enough space for two cache lines' worth of data instead of just one. That's because, on .NET, you can't specify the alignment of data beyond some inherent 4-byte and 8-byte alignment guarantees, which aren't big enough for our purposes. Even if you could specify a starting alignment, the compacting garbage collector is likely to move your object and thus change its alignment dynamically. Without alignment to guarantee the starting address of the data, the only way to deal with this is to allocate enough space both before and after data to ensure that no other objects can share the cache line.

For Java and .NET full-fledged objects (reference types), the solution is basically the same as for .NET value types, but more intrusive: You need to add the before-and-after padding internally inside the object itself because there is no portable way to add external padding directly adjacent to an object.

Applying this second approach to Example 1, we could change just the definition of the result array to space the array elements far enough apart. For example:


// Example 3: Simple parallel version (de-flawed using padding)
//
cache_line_storage<int> result[P];
//… etc. as before, just replacing result[p] with result[p].data …

Running performance tests confirms that this results in the same scalability curve as Example 2.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video