Of Quarks and Practical Parallel Programming
In this conversation with Timothy G. Mattson, Senior Research Scientist Intel, Computational Software Lab, we find close agreement on the best approach to applying parallelism to business and general application programming. But first we discuss quantum physics.
From QM to Programming
DDJ: Tell us about your quantum physics work.
TGM: My PhD is for my work in quantum reactive scattering. We picked simple chemical systems where we could treat every system with full quantum mechanics.
DDJ: You were right at the border where it emerges out of the quantum world to the world of classical physics?
TGM: I wouldn't quite say that. We were definitely far down in the quantum domain. We weren't in the subatomic physics. We were looking at the molecular physics level. We could bridge into the real world using the rules from statistical physics, how you can take the detailed energetics you derive from Schroedinger's Equation and propagate that up into the macroscopic domain collision cross-sections.
Where it went into the real world was inside laser cavities. The research area was the production of chemical lasers where you use a chemical reaction to produce the population inversion. When you get the incredible oscillating electric fields that you build up inside of a laser cavity, you do some big changes to the fundamental quantum physics. A lot of the work I was doing was on how you build up a theory you can apply to quantum reactions inside a laser cavity.
I did that work in the 'eighties when the computers we had were absolutely primitive. I had to do very sophisticated mathematics to get things into a form we could actually solve using very detailed programming. So by the time I was done I had to know a lot of physics, I had to know a lot of chemistry, huge amounts of mathematics pertaining to these differential equations, and I had to pick up the better part of a degree in computer science.
DDJ: Tell us about your work with Geoffrey Fox.
TGM: I did a post-doc at CalTech in a program Geoffrey Fox had right after they built the Hypercube, the Cosmic Cube. It funded a lot of post-doctorates on campus trying to figure out how to apply the Cosmic Cube to different problems. I worked on quantum reactive scattering and electron molecule scattering cross-sections.
DDJ: So like your predecessor Dijkstra you were "corrupted" from your true field and the computers became the object of study? :)
TGM: You're right ... What I discovered over the years is that I'm really good at computer science, at understanding the language of an application domain, and understanding the language of hardware and software architects so that I can be that bridge. I'm better at that than at doing the original physics and chemistry that I love so much. So I've evolved over the years to what I do well. I consider myself a common footsoldier in the battle of science. I support the computing that the geniuses that do the physics and chemistry need in order to get their job done.
DDJ: I designed a parallel business application. The goal was to take a server application written as monolithic processes grinding away at a database and make it run correctly on dual core in true parallel and making it truly distributable. I improvised and made it all message based, with little tasks that did one thing, each getting its work from a message queue and putting results to another message queue (see PQR - A Simple Design Pattern for Multicore Enterprise Applications).
TGM: You are touching on an issue I frequently get involved with here at Intel. Message passing is tremendously underappreciated. I'm one of the small number of people who have been doing parallel programming so long ... I've done extensive message-passing programming .. and I've done extensive shared address space programming. Shared address space programming is by far the most common right now. I'm telling you, message-passing programming is so much easier, so much safer. It does require more work up front, that's why it has a reputation of being so hard, but you more than pay for that work up front by how much easier it is to validate.
You don't have all this unintended sharing through shared address spaces. You don't have to write locks -- they are implied by the messaging. All your interaction is segregated into bundles of messages which you can easily inspect. Whereas if everyone is stomping over one address space, I can have all kinds of sharing I'm not aware of.
I firmly believe that over time a sense of rationality will set in and we'll stop being so dogmatically committed to shared address spaces. Message passing will take off and become dominant.
DDJ: In application land. In middleware land, they have to write locks. But that sort of software is performing tasks which can be described precisely and briefly.
TGM: I'm talking specifically about the APIs, the programming languages, the notations that applications people use. If I were to state the common theme of my research back into the late 1980s, it is that: what programming languages, what programming technologies, what application programming interfaces can we come up with that will make serial software rare, that will make it so a programmer routinely writes parallel code.
Now, what goes on underneath the covers in terms of the middleware, the current operating systems, etc., that's very important ... but I find it very boring. It's not as critical, because you really can take a small roomful of hardcore experts and they can just get it right. And you can support them because it is a small number.
It's to the masses of parallel application programmers that we can't provide that level of support. They have be able to create the parallel software on their own. So we have to give them what they need to get that job done.
For me -- not based on theory or hunch, but based on writing dozens of message passing programs and dozens of shared address space programs -- I'm more productive with message passing programs.
Message Passing Processor Architectures
TGM: We are doing research on architectures which are truly message passing. If you look at the eighty core Tera-scale chip we at Intel were looking at a year or two ago, that was strictly a message passing environment. That wasn't the primary focus of the chip: the circuits, the 2D mesh and power management formed the primary research focus. The secondary focus was understanding how you would build a message passing architecture on a many-core chip.
DDJ: This was not designed to boot Unix.
TGM: This was a research chip. It had no operating system. It was never to be programmed except by a tiny team of nuts, and I led that small software team of nuts. No compiler, no operating system ...
DDJ: I have something like that on my desk (with the advantage of a Forth cross-compiler), the SEAForth chip that Chuck Moore, who created Forth, is now designing. They are up to 40 cores last I asked, with a N-S-E-W connection between each core, all designed using tools Chuck wrote himself.
TGM: I'm just thrilled when I look around at what's happening right now. There are more and more people asking the hard questions about the established dogma, the dogma of a modest number of powerful cores that share an address space ... I'm just not convinced that's the right way to go. Maybe people can find a better way to do things.
Solving the Parallel Programming Problem: It's Lovelier the Second Time Around
TGM: I feel that perhaps the spread of parallel could have started in the early 1990s but we all missed the chance. Now we've got a second chance.
DDJ: Something qualitative has changed since the early 1990s. Parallel programming was that much harder and the speed in the small, expensive boxes available for commercial servers wasn't that impressive relative to the cost and difficulty. Now powerful parallel processing environments are consumer items.
TGM: My G1 cellphone has four processors. My laptop has four. My Apple Lapbook has the NVidia card with how many processors? But we still haven't solved the parallel programming problem.
DDJ: I.e., how do you use this energy in garden-variety applications written by average programmers?
TGM: Right. We know how to have highly trained specialists write parallel software. We found that if you are willing to work at it hard enough most any problem can be cast in a parallel form. But getting the hundreds of thousands, or perhaps millions of software engineers around the world to routinely write parallel code, we don't have the technology on the table right now.
DDJ: Microsoft is offering its own approach.
TGM: Microsoft has a very exciting approach. I'm excited, and to hear myself say that about the Windows environment -- I'm from a Linux background -- is hard on me. I can't talk about it right now, but I've seen what they are working on, and they understand, they really do.
DDJ: Are you planning an automated tool to execute your Pattern Language for Parallel Programming?
TGM: History has shown that automated tools in parallel programming have never worked. There's not a single commercial program out there that is highly parallel that was created through an automated tool.
Our strategy in the PLPP paper is exemplified by my ongoing work with the Berkeley folks. We understand the collection of design patterns that parallel programmers have found to work. We represent how they all fit together and work together to create an overal software engineering discipline of parallel programming. We do that by creating a design pattern language.
Then, once we have that design pattern language, that becomes the roadmap for the software frameworks for making explicit parallel programming easier. So it's a long-range research program where the first step is to create and validate the design pattern language. The second step is to create these frameworks.
Making a Framework that will Work
DDJ: And that's what you yourself are doing?
TGM: That's what I'm doing. It's a big job and I'm not doing it alone. My principal collaborators right now are Prof. Kurt Keutzer's group. We are actively trying to grow the collaboration. We meeting with Dr. Ralph Johnson at UIUC, who is one of the Gang of Four. We are trying to create a community that cuts across the parallel programming world to create a consensus pattern language that describes the process. From that point of view we can describe the right framework.
I can't emphasize enough the order of these steps. Lots of frameworks have been created before, but they haven't worked. I think that's because they didn't step back first and ask, "What is the architecture? What is the roadmap? What's the big picture?" I'm not aware that anyone yet did that first before they started designing their framework.
We are very deliberately not getting caught up in building the actual framework. Get the design pattern right and validated, and then we can have real inductive discussion about the framework.
This is all tightly integrated with ParLab and their "Berkeley View", as expressed in The Parallel Computing Laboratory at U.C. Berkeley: A Research Agenda Based on the Berkeley View. This has been an influential work. I'm lining up much of my research around this group and their work because the problem is so big, we're going to have to get a lot of smart brains working on it together."The common theme of my research back into the late 1980s, is: what programming languages, what programming technologies, what application programming interfaces can we come up with that will make serial software rare, that will make programmers routinely write parallel code."