Just Say No To SFAQL Parallelism
I know, I know, a lot of folks out there are big subscribers to the 'just-get-'er -done' school of software maintenance and development. The idea of sitting somewhere while a design group is doing its work is just plain torture. It feels like a waste of time and money. Somehow there's always a fire that demands that we code now and capture the design later.
I'm sure the valiant souls that came up with agile software development had very good intentions and are probably supported by some pretty reasonable theory but if they could only see how twisted and perverse agile development can become in practice, especially in software that is destined for parallel and multicore processor environments they might have refrained. Ship now, patch soon to follow. We need it in production today! Send the customer an e-mail explaining a work around for the defect. We can't wait two weeks we've got to get it out on the web now. Just post the browser requirements. If they want access to our site they'll upgrade!
The complexity of parallel programming doesn't seem to deter the 'just-get-er-done' crowd not one bit. In some instances the parallel programming requirement becomes a simple budget justification for buying a license or two of some multicore software tool X from Vendor Y that is guaranteed to maximize the optimal concurrent capacity of the N-core servers and 'speed the development time of the entire team'. I'm reminded of words of Sir Walter Scott (or was it Capt. Montgomery Scott), 'Oh what a tangled web we weave when first we practice to deceive'. All we can say to our software development brethren is to just say no when asked to produce software in a process that shortcuts, leaves out, demeans, belittles, or otherwise stifles the proper design efforts necessary to produce high quality, accurate, correct, reliable and safe software. If multicore will make a good design run faster it will make a poor or bad design crash faster still. Don't get us wrong. We do believe in using the right tool for the job. But we've learned (through experience) that the tool should not be used as a replacement or shortcut for the due diligence of design and software engineering. Quadcore is quickly becoming the norm. Soon it will be 8 cores and 16 cores before you know it. There is already a mismatch between the software design and the new multicore paradigm. As the number of cores increase the mismatch becomes more apparent. There is the inevitable pressure to 'just-get-er-done' and bring the software design magically inline with the new reality of multicore processors. The most seductive way to do this is just to grab one of the new tool sets out there and just have at it. While this approach will actually get some traction in a couple of areas, in general those who fall into this very seductive trap are missing the bigger picture. We're on the road to massive parallelism at the hardware level but most of our commonly used design metaphors and development vocabulary lexicons are tied to decades of sequential programming techniques and at best generation one parallel programming techniques. In general, we lack the design vocabulary to communicate in the context of massively parallel execution contexts.
Once it is determined parallel programming or concurrency requirements are demanded by legacy or new software development efforts. If it is not already present, a new vocabulary, a new set of design concepts, a new set of testing concepts, in short a whole new development culture that is 'concurrency-aware' has to adopted and shared by the development team or group. Concurrency, parallelism, threading, etc., will impact, design documentation, implementation documentation, group communication, testing techniques, debugging techniques, legacy library usage, 3rd party component integration, and so on. And once the team or group adopts this culture it's a long term proposition. Inertia will set in and then no one will be interested in switching to some new concurrency paradigm. This is why teams and groups that have not yet taken the parallel programming plunge and are in the process of evaluating their needs and performing the requisite system analysis should proceed with caution. Many tool sets are accompanied by a canned philosophy that may or may not be what the doctor ordered. SFAQL (Shoot First Ask Questions Later) Parallelism can introduce many dead-ends and false starts to a team just entering parallel programming fray. Multicore and soon to be massively multicore computers are here to stay. Design philosophies and tool sets should be thoughtfully and carefully chosen. The software development, software engineering, and the computer science communities are undergoing a number of paradigm shifts at the moment. A forward looking software development group will really take pause and consider the big picture and then put together a short range, mid-range, and long-range plan that integrates the new computer architectures.
We are convinced that the paradigm shift necessary to truly take advantage of massive parallelism will not come in the form of some nifty vendor tool that hides or encapsulates parallelism from the application layer or application developer. While compiler improvements that can transform some sequential code to parallel code will be and are welcomed, these improvements will be by definition negligible when compared to problem space and solution set space complexity. Of course, we're also grateful for the tools that are attempting to hide interprocess communication, control, mutexes, semaphores, and locking. And for those of us implementing various types of servers that can utilize those tools, hooray! But the real paradigm shift will have to come at a level higher than coding or coding tools, caveat emptor. If we zoom out and look at software forest instead of the software tree, we will see that software is crossing a threshold of complexity and complication that is beyond our commonly used software maintenance and development paradigms. As this complexity clashes with the trend in multicore processors it becomes painfully obvious that a significant paradigm shift is in order with respect to software design and maintenance metaphors. Tracey and I are putting all of our eggs into the basket of major breakthroughs at the software/system design and problem-solution modeling levels. The current prevailing design models are simply not sufficient to scale even with the nifty vendor solutions that make concurrency transparent to the application. To give you a clearer understanding of where we are coming from (and consequently where we are going), consider our simple model of a successful system/software design shown in Figure 1.
A successful system/software design will be an intersection of the problem space, the solution set space, and a knowledge space. For convenience, we will call this intersection S. While this might be obvious to some it may not be to others. S is where the action is or shall we say S is where the real innovation and paradigmatic breakthroughs will necessarily come from. To put it simply, the problem space represents the complete initial state of affairs in the problem domain. The problem space could be understood as a model that captures the world that describes the problem domain scenario. The solution set space is the set of good and (for our purposes) bad solutions to the problem captured in the problem space. The knowledge space represents the knowledge that is available to bring to bare on the problem space and on navigating the solution set space. Please keep in mind that we are presenting a oversimplified notion of problem space, solution set space and knowledge space. I can just see the e-mails now chastising us for what we've left out.
To give an example of what we mean by problem space, solution set space, and knowledge space, we'll resurrect the 19 emails problem that we had. In this problem the problem space consisted of the fact that we had 19 emails received over several years and that for reasons unavailable we had no time or date stamp information for the e-mails. In fact we had no type of information that could put the 19 e-mails in the proper chronological order. Further, the problem space would include the fact that we needed to reconstruct the chronological order that the e-mails were sent in because the e-mails plus the proper ordering contain sensitive information that could only be understood if the proper chronological order of the e-mails was established. The problem space would also include the fact that we were pressed for time in identifying the correct sequence of the e-mails. All of this together describe a simple model of the problem space.
For this e-mail problem we simplified the solution set space and we described it as 19! or factorial(19). This amounted to 121,645,100,408,832,000 possible arrangements of the 19 emails. Since in the worst case scenario all 121,645,100,408,832,000 might have to be considered in order to identity the correct chronological order of the 19 e-mails.
The knowledge space includes deep natural language processing capabilities of a software agent, as well as the ability to generate permutations. The knowledge space in this case also consists of robust domain specific ontologies (more on that later). More importantly the knowledge space includes deep knowledge of the notions of parallel search and cooperative concurrent problem solving. Somewhere in the intersection of these three spaces (problem, solution set, knowledge) is a successful software/system design that we conveniently call S.
Concurrency and multicore architectural issues are actually resolved during the detailed expansion and examination of S. It is important to note that the concurrency design is thoroughly dealt with long before implementation tools or products are even considered. Long before we talk about debugging tools, watching multiple thread execute, and monitoring process pools using tool X, parallelism concerns must be dealt with at the design level. Make no mistake about it, if concurrency is not effectively dealt with at the design level (the S level) then whatever software is produced will be unreliable, impossible to maintain, and impossible to extend. Now the question is what metaphors, paradigms, solution models, and design tools do we have that deals with concurrency and parallelism at the design level. Are these design tools, metaphors, and paradigms sufficient for the impending clash between the on coming software complexity and the trend in multicore execution environments?
Let's spin it around at look at from another vantage point. If we consider any significant problem that we are trying to solve using software and software agents then we can look at a line with two end points. At one end is the unsolved problem and at the other end is the solved problem. Between the unsolved problem and the solved problem is something that we'll simply call 'work'. To drill down on this we'll go back to our 19 e-mails problem. At one end point we have 19 e-mails in some potentially random order. At the other end point we have the 19 e-mail in the proper chronological order. At issue is how much work will it require to put the e-mails in chronological order and what kind of work will it require. How long will the work take? Do we even know how to do the work in the first place? What is an efficient method of doing the work? What is an inefficient method of doing the work? How much effort is needed to do the work? Figure 2 gives us a picture of this and how it relates to our three spaces.
Now in our 19 e-mails problem the nature of the work may include search, inference, natural language processing, probabilistic reasoning, and symbolic processing. Now we have classified the activity between the two end points in Figure 2 as work. How we characterize this work in the context of getting to the solution has everything to do with whether concurrency is necessary and if it is how it will be used. Beyond this, what problem solving tools do we have at the design level? Keep in mind we are only talking about the design level here.
Our use of parallelism and concurrency is constrained by the tools, paradigms, and vocabulary we have at the design level. At the end of the day if we cannot convincingly and effectively describe the concurrency required to reach our solution at the design level then the software engineering effort is in jeopardy. Concurrency may introduce itself in the requirements in many ways. It may be an explicit statement in the requirements that some set of components must operate simultaneously and must communicate synchronously or asynchronously. That is concurrency is natural attribute and requirement of problem space and solution set space. Concurrency may introduce itself as a result of nature of the solution model. For example I may use the blackboard model of problem solving to deal with the deep natural language processing requirements that are present in my 19 e-mails problem. The blackboard model just happens to be effective with solving certain kinds of natural language processing. The fact that it includes parallelism or promotes concurrency is secondary. Concurrency may introduce itself because the nature of the problem is sufficiently complex or complicated that concurrent divide-n-conquer solutions are the most natural fit. Concurrency may also introduce itself into the design because time constraints requirements and size of the search space realities suggest designs that attempt to find solutions on several fronts simultaneously.
What we are suggesting here is that the problem space will either require or insinuate concurrency in the solution model. The solution set space may be so partitioned that concurrency is the natural way to navigate it. In other words, concurrency/ parallelism will be an artifact of the problem space, the solution set space, or both. So we manage and model concurrency at the design level. The system/software requirements will typically have a performance constraint that must be met by the hardware in order for the software to be considered successful. For example, in our 19 e-mail problem it's absolutely crucial that we get back the correct chronological order of the emails in less than 30 seconds. So in the worst case scenario we have a 100 quadrillion or so possible ordering of the emails, we have less than 30 seconds to find the right one. If I know one agent can evaluate one ordering every 3000 seconds then I know one agent working by himself would take 3000 times roughly a 100 quardillion to evaluate every ordering. And we might have to evaluate every ordering because the last arrangement could be the correct one. Just for fun let's say 3000 quadrillion seconds is roughly equal to a septillion seconds. We know that the successful system has to find the correct chronological order in less than 30 seconds. So how do we take a little more than septillion seconds of work and distribute in such a way that it can be done by some number of agents operating in parallel in less than 30 seconds. What if each agent gets a dedicated core to work with. How many cores would it take to fit a septillion seconds of work into under 30 seconds? How many mutexes, semaphores, pipes, and queues are we talking about if the agents need to communicate and update some shared piece of information simultaneously? Should we just go out and buy the biggest box(es) available and the most recently announced parallelization tool(s) that will make parallelism transparent to the application and 'just-get-er-done'? Keep in mind we're only talking about arranging 19 e-mails here and I'm sorry to say that SFAQL Parallelism won't work here.
We will have a great deal to say about this problem because it's extremely representative of many problems we encounter in software engineering and computer science every day. In fact because of the complexity inherent in the Internet and the many forming Clouds on the horizon we will have software agents that will be faced with problems many magnitudes of order bigger than the problem of fitting a septillion seconds of work into 30 seconds. So how we model that work at the design level will be critical to our ability to successfully create a manageable, reliable, and safe system. Obviously the paradigm that lead us to a septillion seconds of work being forced into under 30 seconds is the wrong paradigm. If you do the math you'll find out that Holger Hoos from British Columbia University was onto something in his "Taming the Complexity Monster" talk. Tracey and I continue to harken back to the ghosts of ICOT and the Fifth generation project for a reason. We suspect while facing the precipice of one set of problems the ICOT researchers may have unknowingly uncovered an aperture that could shed light on our current dilemma of operating systems, parallel processing, and programming paradigms.
Today it is applications with hundreds to thousands of threads. What will do we do when its a hundred thousand or more threads? What happens to synchronization at that level? How will we even make simple programming changes when hundreds of mutexes, locks, and synchronization mechanism are all simultaneously in play. What will the documentation look like? What will change management look like? What super slick doodads will the vendors be selling then? One thing is certain, SFAQL Parallelism won't work because it produces unpredictable, unreliable, unmaintainable, brittle systems that cannot scale and that are impossible to understand even by the original team once a little time has passed. Going forward we need to place far more emphasis on devising design tools that will help us design correct, reliable, and safe systems. Tools that truly integrate with the culture of the development team or group. Design methodologies that will be around for the long haul.
Metaphysical Logical Positivist Post-Modernistic Parallel Philosophy Thought For Today
Perhaps large monolithic imperative-procedural-based system designs are reaching their practical limits. Maybe instead of larger more complex, we should be thinking smaller simpler. Maybe instead of thinking imperative-procedural, we should be thinking declarative-inferential. Maybe instead of thinking parallel and concurrent, we should be thinking in terms of induction, recurrence, and recursion. Hive and colony instead of cluster and network.