You and Those Multicores and CMPs: Were You Formally Introduced?
It's late Friday and it's been a long week. You've been 12 hours coding, compiling, and crashing with a piece of multithreaded software for the lastvsix days straight. You were mocked by nasty hard-to-find freezes, and hard to explain (no impossible to explain) errors. But finally, it's working.
The debugging is over. You've ran a few black box tests, a white box test, and your regression testing all looks good. You're confident it's going to pass the Q&A group next week, so to score a few extra points with the client, you push it up to the client's server, drop them an e-mail letting them know that with a little extra effort on your part, you got the module up and running ahead of schedule. All that's left now is to get a bite to eat and kick back with a little Red Dead Redemption and look forward to a nice relaxing weekend. At least that was the plan. But little did you know, your counterpart at the client's office was also working late. He got your e-mail, decided that he would score a few extra points with the boss by installing the module tonight and having the whole kitt-n-kabuttle up and running Monday.
So, right in the middle of Red Redemption and the sweetest gun fight you've ever been in, you get the text message, "It's Broke -- Error 5712 No Resource". Well, after bouncing the controller off the floor, you say to yourself, impossible. We tested everything!
Some of the most subtle errors you'll ever encounter when doing parallel programming or multithreading are the result of undiscovered data race, or deadly embrace which only appear on certain data combinations. Meaning they don't really show up until the planets are in a certain configuration and you're processing data that the application rarely or almost never processes. But even worse than the once-in-a-thousand-year dataset, are the hidden race conditions and embrace conditions that were masked because of particular hardware that the software is running on. They only show up when memory chips, physical processors, or secondary storage are changed or when the software is executed on a new faster computer or older slower computer. Hardware that introduces timing differences or latency differences that expose data race conditions and deadly embrace conditions that you didn't realize your application even had. It's one thing to track down and fix every bug that you are aware of. It's a totally different story to fix a piece of software that appears to be working perfectly! It's not the case that your application is correct when it runs on computer A but computer B breaks it because there are hardware differences. Your application is broken on computer A, you just haven't discovered it. It took computer B and it's faster RAM or additional set of cores to expose your application's weaknesses.
So the game is to design and implement correct parallelism and multithreading that is immune to data permutations and is machine independent. That's the only way you can ever be sure that your weekend belongs to you. The goal is to achieve data independence and machine independence while maintaining the correctness and accuracy of the application. This kind of independence is accomplished at the specification and design level, using formal modeling, model checking, formal methods, and formal languages. This is done long before any coding or implementation. Of course these are tools for the computer scientists, software engineers, software architects, right? Well, I'll just say first they are not for the weak of heart, but they are necessary for anyone who is responsible for designing and implementing concurrency or parallelism in software. Before you can design and implement fault tolerant, bullet-proof concurrency, and parallelism, you have to be formally introduced. Tracey and I have some formalisms, formal methods, formal languages, and paradigm shifts we're about to throw your way. Stay tuned!