Channels ▼
RSS

Tools

Custom Parallel Partitioning With .NET 4


Stephen Toub is a member of the Parallel Computing Platform team at Microsoft.


Partitioning is the act of splitting a data set to be processed in parallel by multiple logical processors. It is a fundamental aspect of parallel computing, as without partitioning everything would execute serially. There are, of course, a multitude of ways in which a data set may be partitioned, and which approach is best for a particular situation depends on a variety of potentially complicated factors.

The .NET Framework 4 invests heavily in helping developers to more easily write parallelized applications. As part of this, the Framework includes Parallel LINQ (PLINQ) for doing data-parallel processing with a familiar set of query operations, as well as the System.Threading.Tasks.Parallel class and its ForEach method, which enables the implementation of data-parallel loops. Both of these constructs need to be able to partition the provided data set, and both utilize multiple built-in algorithms and heuristics for partitioning data sets to be processed in parallel. However no set of built-in components could adequately represent the full spectrum of possible partitioning approaches. As such, both PLINQ and Parallel.ForEach support custom partitioning, enabling developers to implement their own partitioning algorithms, which PLINQ and Parallel.ForEach can then consume.

Partitioner<TSource> and OrderablePartitioner<TSource>

Partitioning logic in the .NET Framework 4 is represented with the System.Collection.Concurrent.Partitioner<TSource> class and its derived System.Collections.Concurrent.OrderablePartitioner<TSource> class. These abstract types provide the core interfaces that are consumed by parallelization constructs like Parallel.ForEach. In fact, Parallel.ForEach is implemented entirely in terms of partitioners, even when it is provided with a simple IEnumerable<T> data source (Parallel.ForEach uses a built-in Partitioner<TSource> implementation that is also exposed for public usage).

Partitioner<TSource> presents a relatively simple interface:


public abstract class Partitioner<TSource>
{	
    protected Partitioner();

    public abstract IList<IEnumerator<TSource>> GetPartitions(
        int partitionCount);

    public virtual bool SupportsDynamicPartitions { get; }
    public virtual IEnumerable<TSource> GetDynamicPartitions();
}

The primary entry point into Partitioner<TSource> is the GetPartitions method. GetPartitions is called by a parallelization construct, such as PLINQ to retrieve a static (or fixed) number of partitions. Each partition is represented as an IEnumerator<TSource> that the parallelization construct will then iterate through serially on a single thread of execution. For example, on a quad-core system, PLINQ will request four partitions from GetPartitions and will then use four threads, each of which will walk one of the four enumerators.

Don't allow the "static" terminology employed here to confuse you. Just because a static number of partitions is utilized doesn't mean that "static partitioning" must be employed. The phrase "static partitioning" is often used to imply that no run-time influence is applied towards determining which partition processes which element. For example, a common form of static partitioning is range partitioning, where a data set of size N is partitioned into P partitions by giving the first N/P elements to the first partition, the next N/P elements to the second partition, and so on. Another common form of static partitioning is striping, or round-robin, where a partition is assigned all of the elements based on round-robin through the data structure's indices. As an example, with four partitions and a data set of size 12:

  • The first partition would be assigned elements 0,4,8
  • The second partition would be assigned 1,5,9
  • The third would be assigned 2,6,10
  • And the final partition would be assigned 3,7,11.

Such static partitioning mechanisms can be extremely efficient due to no need for synchronization between partitions in order to determine which partitions get which elements. However, they can also suffer serious flaws in the form of a lack of load-balancing: if the time required to process individual elements varies at all from element to element, one process may end up being given a greater load than others. For these cases, more dynamic forms of partitioning may be appropriate, where all partitions collude to determine which elements are processed by whom. As such, it's possible (and in fact common) to have a fixed number of threads engage in dynamic partitioning.

Partitioner<TSource>'s other main entrypoint is GetDynamicPartitions, which returns an IEnumerable<TSource>, whose GetEnumerator method is used to return a new partition represented as an IEnumerator<TSource> . GetEnumerator may be called any number of times, and each time it dynamically introduces another partition that will collude with all previously created partitions from that dynamic partitioner (the enumerable). While it's quite common for a Partitioner<TSource> to support a dynamic number of partitions, not all do, and thus the SupportsDynamicPartitions property is available for consumers to check whether they can take advantage of it.

The OrderablePartitioner<TSource> class extends Partitioner<TSource> with the notion of an ordering to the elements being partitioned:


public abstract class OrderablePartitioner<TSource> : Partitioner<TSource>
{
    protected OrderablePartitioner(
       bool keysOrderedInEachPartition, bool keysOrderedAcrossPartitions, 
       bool keysNormalized);

    public abstract IList<IEnumerator<KeyValuePair<long, TSource>>> 
        GetOrderablePartitions(int partitionCount);
    public virtual IEnumerable<KeyValuePair<long, TSource>> 
        GetOrderableDynamicPartitions();

    public bool KeysNormalized { get; }
    public bool KeysOrderedAcrossPartitions { get; }
    public bool KeysOrderedInEachPartition { get; }
}

This is necessary for parallel constructs that need to understand the order of the elements from the original data source. For example, overloads of Parallel.ForEach support passing into the body of the loop the index of the element being processed. As another example, PLINQ's AsOrdered operator is used by developers when they want to ensure that the output ordering of elements from a processed query match the input ordering; PLINQ can only achieve this if it knows the input ordering, which requires that the partitioner inform it of the ordering. In addition to the base type's GetPartitions and GetDynamicPartitions methods, OrderablePartitioner<TSource> adds GetOrderablePartitions and GetOrderableDynamicPartitions methods. The signatures of these methods are almost identical to the base implementation, but instead of returning streams of TSource instances, they return streams of KeyValuePair<long,TSource> instances; each pair contains the TSource data as well as its position in the original data sequence.

A mechanism like PLINQ's AsOrdered needs to deal with the fact that multiple partitions are being processed in parallel, which can perturb the order in which elements are completed. In the worst case, elements could emerge from processing in a completely random order, which would force the consuming construct to do an expensive sort operation over the entire result set. However, it's rare that this worst case will arise, and it's likely that the partitioner knows enough about its partitioning algorithm that it could make some guarantees to the consuming construct which will lighten the overhead. For example, if static range partitioning is employed to partition an input of size N into P pieces, all of the N/P pieces assigned to the first partition will by definition have positions prior to all of those assigned to the second partition. As such, while each individual partition's results might need to be sorted, the entire result set of the first partition can be placed prior to the entire result set of the second partition, eliminating the need for sorting across partitions. To assist with this, OrderablePartitioner<TSource> exposes three potential contracts, which a derived implementation can choose to make or not, and which a consumer can choose to use as an optimization if they're made. If KeysNormalized returns True, all of the elements provided by the partitioner will have unique positions in the range [0,numberOfElements). If KeysOrderedAcrossPartitions is True, all of the elements in partition N have a smaller position than all of the elements in partition N+1. And if KeysOrderedInEachPartition is True, elements will be yielded from an individual partition with increasing positional values.

In most cases, custom partitioners derive from OrderablePartitioner<TSource>. This is because many data sources have notions of ordering (either explicit or implicit), and it typically takes very little extra effort to track the positions of each value from the source. Additionally, OrderablePartitioner<TSource> provides default implementations of GetPartitions and GetDynamicPartitions based on the derived implementations of GetOrderablePartitions and GetOrderableDynamicPartitions, so developers still only have to focus on implementing two core methods.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video