Microsoft Parallel Computing Platform
David Callahan is a Microsoft Distinguished Engineer. Prior to Microsoft, David was at Tera Computer which in 2000 acquired and became Cray, Inc., where he worked on High Performance Computing (HPC). At Microsoft, David is a member of the Parallel Computing Platform team.Dr. Dobb's: What does the Parallel Computing Platform team do?
DC: Three years ago we formed this team inside Visual Studio. It had the responsibility of helping Microsoft get ready for the shift to multicore and many-core. We had broad responsibility around the company but we were centered in the Developer Division because we believed the impact of this fundamental shift in how programming is done was mostly going to be on software developers.
We have investment in both managed and native. We built up some abstractions to support parallel programming. We built up user-mode runtime scheduling facilities. We negotiated some changes to the NT kernel in the way it deals with threads. We built a set of companion tools aimed at debugging and performance analysis. This will all ship as part of Visual Studio 2010 which is currently in beta.
Dr. Dobb's: What's the connection between the Parallel Computing Platform and the Parallel Patterns Library?
DC: PPL is the set of C++ native execution abstractions. It's part of our team effort. We worked a lot with Herb Sutter on the C++ team when we were setting up the basic structure and syntax of the abstractions we were putting out.
We brought two classes of expertise to the problem. My background is mostly in optimizing parallelizing compilers but I also have a lot experience in parallel programming. So I sort of knew the patterns we wanted to encapsulate. We engaged Herb to understand what's the best way expose those patterns to the C++ community, to understand how they would fit well with the STL, and to smooth the way to eventual standardization, if that's the path the industry chooses to follow.
We're also engaged with Intel to rationalize what we're doing with their product and to insure, particularly in the C++ space, that we don't both generate two nearby offerings that are incompatible.
Dr. Dobb's: So how much of this is in beta now?
DC: There are two things that are now in beta. There's .NET 4 including our technologies, and there's Visual Studio 2010, which is how the C++ assets are branded and shipped to market.
Parallelism is a cross-cutting concern so we made identical investments in the managed and the native environments.
Dr. Dobb's: Do the libraries look the same?
DC: No, they don't. We went to Herb to understand what's the right way to present this to a modern C++ developer. There are idioms and patterns and the way languages do things that lead to a particular set of choices for C++ that are different from C#.
Dr. Dobb's: Foreach loops must be foreach loops in either environment. What's a good example of the differences?
DC: For certain basic, structured concepts, they look very similar. But the way the underlying machinery is exposed ... for example, in the C++ world, there are some optimizations that make sense when you have a structured parallel loop and you know that the containing function activation won't return until everything that loop references is done. You can stack-allocate in the parent function. That leads to a choice of introducing a few APIs to cater to that allocation and give you a little bit more performance.
That optimization is not directly available in a managed language.
Dr. Dobb's: The optimization may be there in the managed language, but it's controlled and invoked by the language, not by the programmer.
DC: That's right. That means there is a certain number of things we exposed in C++ that have no reasonable analog in the managed environment.
The other difference is that we have a lot of existing frameworks in the managed world. Microsoft Developer Division ships more managed source code than we ship native source code. There were certain patterns over in the native framework we wanted particularly to tie into. This mostly had to do with the lower-level task mechanisms, the more primitive elements of execution, how they were surfaced with respect to other asynchronous programming patterns in the .NET framework. There were some conventions we wanted to adhere to that had not yet been broadly established in the C++ space.
Also, in the last release of .NET we put a lot of effort into LINQ, direct linguistic support in C# and VB for the standard SQL query operators. The cool thing about that operator set is that you are already talking about data in aggregate and doing data-parallel operations over it. There are natural parallel interpretations of many LINQ queries.
So on the .NET side we built out a parallel query evaluator for LINQ queries. We call it PLINQ ("pee-link"). This was an opportunity to say, "Here's an existing framework that is amenable to parallelism. Let's add some parallelism machinery to it."
Dr. Dobb's: Do you follow supercomputers still?
DC: We've finished this first wave of technology, and now one of the things we are thinking about for the next wave is how we take our desktop abstractions and reconsider opportunities to scale them out to other scenarios. Microsoft has an HPC effort. We'll be looking at extending our abstractions so that people can develop a skill set they can use in a variety of environments.
Dr. Dobb's: How scalable are our general commercial development multicore desktop approaches to parallelization, the low-level constructs and the higher-level task parallelism? At how many cores do we achieve diminishing returns?
DC: Nobody really knows. As the underlying architectures evolve to better suit parallel programming, who knows where it's going to end up? Memory bandwidth will increase. The bus will go away as a means of transferring memory values from DRAM to the processing cores. It's already disappearing in favor of point interconnects like hypertransport, and it may disappear vendors shift to stack DRAM instead of traditional printed circuit board mechanisms for connecting processors and memory devices.
The other part of the scaling question is, "How much complexity can we manage in the applications we build?" That may be a real problem. We're shaking the foundations of the software infrastructures we built by asking for change at the bottom level. We'll see major re-architecture in the next decade as people finally have to change how their code looks. You don't want to change a few hundred thousand lines of code until the business imperative meets up with it. We're not quite there yet.
Right now the market is dominated by dual-core systems. It might soon be dominated by four. You're not going to rewrite a few hundred thousand lines of code for less than a factor-of-two speedup.
Dr. Dobb's: Which you won't get with 2 cores, but with 4 or 8 you would.
DC: Right. So the pressure you're going to be under is that not only will we be getting more cores, but the cores themselves won't be that much faster than they used to be. Every time you add a new feature or double the size of a dataset, you have this performance tuning problem.
So for a few years people will go through the process of performance tuning and try to use parallelism only in a few spots where believe they can get away with it, since it's a new technology, it has some danger, none of your staff is trained in it ... But eventually people will realize they can't go through this performance-tuning effort every two years, and will step back and re-architect their applications for scalability.
You'll take some ideas from service-oriented architectures, designed for loose coupling and scalability, some ideas from data parallelism ... The industry will dive bit by bit into it, some earlier, some later.
Dr. Dobb's: I did just that in 2006-2008 when I architected and directed a server project that was parallelized and scaled on a pattern of simple one-idea discrete tasks connected by message queues.
DC: Anticipating a return to a pipe-and-filter architecture, we added a bunch of mechanisms for establishing message blocks, connecting components through queues. We call them "agents" to avoid conflicting with prior art in this area. We advise people that if they have applications that naturally decompose into a service-like way, they should consider building using these pieces. They naturally give you latency tolerance and scalability over time.
I think architecture by building asychronous communicating agents is one of the key architectural paradigms we'll pursue.
There's a separate set of abstractions for architecting around data aggregates that have natural elemental decomposition.
Those two styles need to interoperate within a single problem, have to share resources, for example. But the abstractions for each of these two styles look a little different, and it's good to have clarity in differentiating between the patterns you are teaching your programmers."We're shaking the foundations of the software infrastructures we built by asking for change at the bottom level."