Channels ▼
RSS

.NET

Parallel LINQ

Source Code Accompanies This Article. Download It Now.


Multicore processors are a standard part of computing these days. For instance, the laptop I'm writing this article on has a Core 2 Duo Intel 64-bit processor and 3 GBs of RAM. That's a lot of computing power for single-threaded code. Luckily, Parallel LINQ (PLINQ), which is part of the Parallel FX extensions for .NET, lets me use the basic LINQ keywords to tap into that extra power with little extra work on my part. The the Parallel FX Library is a managed concurrency library that includes PLINQ and the Task Parallel Library (TPL).

The basic use of PLINQ is to add a reference to the downloaded System.Threading.dll installed by default at C:\ Program Files\ Microsoft Parallel Extensions Jun08 CTP and call the IParallelEnumerable.AsParallel extension method on your collection. IParallelEnumerable<T> inherits from IEnumerable<T> and generally appears in a LINQ query at the end of the from range in collection clause. For instance, if the collection were an array of integers named numbers, then you would substitute collection with numbers.As Parallel().

Listing One uses the Sieve of Eratosthenes to determine if a number is a prime. A list of primes is built using the sieve technique, and a LINQ query runs through a bunch of numbers testing for primality. Listing One(a) is the sequential query and One(b) the parallel version.

Listing One


(a)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Threading;

namespace Primes
{
  class Program
  {
    static void Main(string[] args)
    {
      const long upper = 1000000;
      SieveOfEratosthenes.BuildPrimes(upper);
      int[] candidates = new int[upper-2];
      for (int i = 2; i < upper; i++)
        candidates[i-2] = i;
      Stopwatch watch = Stopwatch.StartNew();
      var primes = from p in candidates
                   where SieveOfEratosthenes.IsPrime(p)
                   select p;
      Console.WriteLine("Sequential Elapsed: {0}", watch.Elapsed);

(b)
      watch = Stopwatch.StartNew();
      primes = from p in candidates.AsParallel().AsOrdered()
               where SieveOfEratosthenes.IsPrime(p)
               select p;

      Console.WriteLine("Parallel Elapsed: {0}", watch.Elapsed);
      Array.ForEach(primes.Take(20).ToArray(), x => Console.WriteLine(x));
      Console.ReadLine();
    }
  }
  public class SieveOfEratosthenes
  {
    private static List<long> primes = new List<long>();
    public static List<long> Primes
    {
    	get {	return primes; }
    	set { primes = value;	}
    }
    public static void BuildPrimes(long max)
    {
      if (max < 3) return;
      for (long i = 2; i <= max; i++)
      {
        if (IsPrime(i))
          Primes.Add(i);
      }
    }
    /// <summary>
    /// Use the Sieve of Eratosthenes: no number is divisible 
    /// by a number greater than its square root
    /// </summary>
    /// <param name="v"></param>
    /// <returns></returns>
    public static bool IsPrime(long v)
    {
      if (v == 2) return true;
      for (int i = 0; i < Primes.Count; i++)
      {
        if (v % Primes[i] == 0) return false;
        if (Primes[i] >= Math.Sqrt(v)) return true;
      }
      return true;
    }
  }
}

If this is the first time you've used LINQ queries, then the queries are on the lines starting with var and followed by = from p in candidates...where...select.... The LINQ queries look somewhat like inverted SQL queries.

Again, the first LINQ query in Listing One is sequential and the second is parallel. The AsParallel extension method kicks off the background threads. At minimum, calling AsParallel is all you need to do to use PLINQ and multiple threads for your LINQ queries.

By default, the original sequence order is not maintained when you invoke a PLINQ query. If you want the order of the sequence maintained, then call AsOrdered. However, maintaining order incurs some performance penalties. On my Dell Precision 470 workstation, the parallel version of this ran consistently slower than the sequential version. Let's explore some reasons performance may actually degrade for parallel operations.

Understanding Exclusion Scenarios for PLINQ

PLINQ works with LINQ-for-Objects and LINQ-for-XML. PLINQ isn't intended for use with LINQ-to-SQL or LINQ-to-Entities. Why? Because the IQueryProvider implementation for SQL Server basically translates LINQ queries to SQL queries for LINQ-to-SQL and LINQ-to-Entities--queries that are processed by the SQL engine instead of in memory.

CPU-bound queries that query large amounts of data, perform intensive computations, or a combination of both will yield better results than queries that are I/O bound, such as queries against the filesystem or SQL server.

You might reasonably wonder why a parallel capability couldn't just automatically be added to LINQ behind the scenes. The answer is that programmer involvement is needed to handle data impurity, concurrency exceptions, thread affinity, ordering expectations, and poorer than expected performance problems in some scenarios. Adverse conditions for parallelism are referred to as 'parallelism blockers' and include:

  • Mutable Data Structures and Impurity. Side effects due to accessing data that is not threadsafe, such as mutable structures like List<T> or side effects due to file I/O, are not good candidates for PLINQ. For instance, copying the elements of one List to a second List using PLINQ may result in race conditions on the Add method that may only infrequently manifest.
  • Concurrency Exceptions. Running multiple threads may result in exceptions being thrown on several threads. Handling multiple concurrent exceptions requires some extra considerations. The way PLINQ handles concurrency exceptions is to put all of the exceptions on all threads into an AggregateException; each exception is accessible through the InnerExceptions collection.
  • Thread Affinity. Some libraries and technologies like COM interop and graphics packages require thread affinity. For example, Windows Forms and WPF require that code that interacts with the UI must run on the UI thread. If PLINQ were automatic, then errors would be introduced for technologies that require the single-threaded apartment model or for GUI elements in WinForms or WPF.
  • Ordering Expectations. PLINQ partitions data to run on multiple processors and cores. By default, this behavior does not guarantee that input ordering matches output ordering. You can elect to preserve ordering manually by invoking the AsOrdered extension method (after AsParallel), but doing so can reduce performance.
  • Decreased Performance. Obviously, the most desirable outcome of using multiple threads is to enhance performance. Sequential algorithms can be complex but are executed in a straightforward manner. Parallel algorithms impose extra overhead due to partitioning data and tasks, synchronization, cross-thread communication, and merging results. If the cost of this overhead reduces performance, then it is undesirable to use parallelism in that instance.

Finally, optimal parallelism is affected by Amdahl's law, which says roughly that performance speedup is limited by the amount of code that must be processed sequentially. That is, at some point after all of the code and data are partitioned, some of it must be processed sequentially, and the sequential processing that must occur and all of that synchronizing, partitioning, cross-thread communicating, and results-merging detract from performance speedup. The net effect is that having two or four processors do not necessarily mean that your code will run two times or four times faster, respectively.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

Video