Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼

Christopher Diggins

Dr. Dobb's Bloggers

Exploiting Multiple Cores in the Heron Interpreter with .NET 4.0

March 01, 2010

The Heron interpreter  now uses the task parallel library from .NET 4.0 for its concurrent list operations.

In my previous blog post I talked about rolling my own thread-pool class in order to try and leverage all of the cores in a processor as much as possible for my Heron interpreter. That thread-pool class had acceptable performance, but was quite naive: it didn't do any load balancing and couldn't handle nested parallel computations which was a severe shortcoming. 

Following several people's advice on StackOverflow.com I decided to try the new parallel computing features of .NET 4.0. It took me quite a while to actually get a performance boost using the library in a non-trivial scenario, but in the end the code performed very well and required very little changes to the code. 

The whole reason I need concurrency in my Heron implementation is to show the concurrent primitive list operations in action. Heron has four list primitive operations, three of which are allowed by the specification to be executed in parallel. They are: map, reduce, and select.

The operations are as follows:
  • map - Transforms a list into a new list by applying an expression to each item. For example "ys = map (x in xs) x * 2"
  • reduces - Combines all items in a list to a single value by applying a binary operation to two items in the list until none are left. "sums = reduce (a, b in xs) a + b"
  • select - Selects items of a list that satisfy a boolean condition. For example: "evens = select (x from xs) x % 2 == 0"
Despite the simplicity of the new parallel programming APIs in .NET 4.0, getting good performance for some of the parallel computing that I was doing turned out be a bit more challenging than I first expected. I encountered a slow down for some computations which are theoretically easily parallelized and which had no explicit synchronization. After some deeper investigation with the profiler, there were several things that I did wrong at first: 
  • Poor use of the cache - In structuring a parallel loop poorly, the cache can become invalidated by each thread in alternation. A blog post by Igor Ostrovsky entitled Gallery of Processor Cache Effects  is an excellent resource.
  • Garbage collector can cause a lot of implicit synchronization. I found that using "serverGC " in the app.config file, and using thread-local allocations where it made sense.
  • Not using partitioning. When a loop has a relatively inexpensive loop body, it makes sense to partition the input into subsequences.

After trying a number of the new approaches to implementing parallelism available in 4.0 (e.g. ThreadPool, Task, TaskManager, Parallel.Invoke, and Parallel.For) the most effective tool for my purposes was one of the overloads of the "Parallel.ForEach" functions. Unlike Parallel.For which can only iterate over a sequence of values, using Parallel.ForEach with a partitioner allowed me to iterate over large sequences in chunks, reducing the burden on the work distribution system. Like many of the Parallel methods it allows the creation of thread-local variables which are shared on all sub-tasks. 

This may sound rather complicated, and the actual call-site of Parallel.ForEach can get a bit hairy, but afterwards the framework takes care of all the synchronizations, thread management, load balancing, optimization, etc. It is definitely worth it!

For example here is where "Parallel.ForEach" is used in the implementation of the map operation in the current Heron interpreter: 

        /// <summary>
        /// Evaluates the map operation 
        /// </summary>
        /// <param name="vm">Current state of virtual machine</param>
        /// <returns>An ArrayValue containing new values</returns>
        public override HeronValue Eval(VM vm)
            // internal structure for indexing lists
            IInternalIndexable ii = vm.EvalInternalList(list);

            // Array of values used for output of map operations
            HeronValue[] output = new HeronValue[ii.InternalCount()];
            // Create a parallel options object to limit parallelism
            ParallelOptions po = new ParallelOptions();
            po.MaxDegreeOfParallelism = Config.maxThreads;
            // Create a partitioner
            var partitioner = Partitioner.Create(0, ii.InternalCount());
                // Breaks the for loop up into sub-ranges
                // Parellel options
                // Initialization of thread-local variables
                () => 
                    LoopParams lp = new LoopParams();
                    lp.op = new OptimizationParams();
                    lp.acc = lp.op.AddNewAccessor(name);
                    lp.vm = vm.Fork();
                    lp.expr = yield.Optimize(lp.op);
                    return lp;
                // Loop body
                (Tuple<int, int> range, ParallelLoopState state, LoopParams lp) =>
                    for (int i = range.Item1; i < range.Item2; ++i)
                        output[i] = lp.vm.Eval(lp.expr);
                    return lp;
                // Finalization function
                (LoopParams lp) => { }

            return new ArrayValue(output);

This is complex code, and I won't bother explaining it beyond saying that it executes the map operation by evaluating an expression named "yield" on each item in the collection "ii" and places the result in "output". However, the joy of it is that it encapsulates in relatively few lines some extremely sophisticated code for distributing work across multiple cores.

In some ad-hoc testing in the Heron interpreter I now achieve a significant speed increase (on average 1.45 to 1.65 times as fast and sometimes even faster) for heavily parallel tasks using map, reduce and select when enabling multi-threading in the interpreter on my dual-core machine. If the Heron interpreter wasn't implemented so simplistically I could surely get even better results.  

To replicate my tests using the latest version of Heron (version 1.0 Alpha 1) you have to set the maxthreads value in the config.xml file to 1 for single-threaded or -1 to let the .NET framework decide the best degree of parallelism. You can set "showtimings" to true to have a crude idea of the amount of wall time spent executing the file. The files "tests/TestMap", "tests/TestReduce" and "tests/TestSelect.heron" demonstrate the various concurrent primitives in Heron.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.