Who Does the Care and Feeding of the Odd Animals?
Right of the bat we know there is a big difference between a server that is using parallelism to provide resources to multiple clients and a computer that has an application that requires massive parallelism as part of its Work Breakdown Structure (WBS). These are very different animals. Here is one of the problems that we are running into more and more these days.The system group, the network group, the web group, whichever group is responsible for installing, upgrading, and patching the computers, web servers, application servers, email-servers, etc., for an organization is simply unprepared for how to deal with parallelism or maintain parallelism requirements outside the context of the client server, services, or three-tier architecture models.
We've been blogging about scope contention of threads, whether you have some threads that are in system-wide contention for the processor or threads that contend at the processor level. Both are heavily impacted by the thread scheduling policy that is in place at the process level and at the operating system level. There are a great deal of system's people out there that simply don't mess with the scheduling policy of some server that they have the responsibility of providing care and feeding for. They leave it set at manufacturer's default, or once the "Vendor" sets it, they leave it alone. In many cases they don't know what to do with it anyway. This kind-a-works for a while until application creep sets in. Vendors come and go. Companies change hands. Mergers happen. So an application that once had a dedicated "server" that was hand-tuned by some vendor, over time and for cost reasons, starts to house more than one application (Oops!). Who can remember what the scheduling and thread polices were from the original vendor or why they were what they were? All we know now is that for some reason the performance of the applications on this "Server" is less than stellar.
The group that does the care and feeding often tend to look at all applications that it installs on its servers through the same lenses. So if my application is not some client-server or a 3 tier web app, but rather some hybrid multi-agent system that requires massive parallelism to solve some problem or perform some task, then my application is probably in jeopardy for a whole host of reasons. In most cases the "Group" won't know how to deal with scheduling policies and priorities as it relates to my application. As an application developer I may be lucky enough to require dedicated hardware, the operating system of my choice, with the settings of my choice as requisites to run my application. Not all of us are that lucky. Sometimes, at gun point we're forced to attempt to deliver multi-threaded, parallel processing applications in an environment that will be shared by other software, where the operating system scheduling policy has been left on default and we have no control, or should I say continued control. Because it is often possible to initially get the box set up the way your application needs it, but keeping it that way is an entirely a different scenario. Once the OS is patched, or the enterprise DBMS is upgraded, or some other vendor muscles his way in your preferences are often lost.
If an application requires serious or massive parallelism, it should be a no-brainer to run that application on its own super computer or cluster, with operating system scheduling policies tailored specifically to the needs of the application. The problem is an application developer you are probably not in charge of daily care and feeding of the computer systems your application is running on. Who is? And what do they really know about, scheduling policies, process contention scope vs. system wide contention scope. When they upgrade or patch the operating system on that computer or cluster, will they know what to do as it relates to your application, maybe if your documentation is complete enough they will.
If you've got some odd ball application that requires serious parallelism but does not fit the typical server models that the system's "Group" is familiar with, ask yourself whose gonna do the care and feeding from a OS scheduling policy point of view of your application once the application development group is gone or has moved on to other projects? Every time we think there is hope for a world where the woes of parallelism have been made transparent, we run into another practical reality. We've blogged about the fact that a paradigm shift is called for many times, but now we starting to see that the paradigm shift is simply unavoidable. These large parallel multicore-based animals are just too odd, and in so many cases the system person (or group) simply does not know how to provide the proper care and feeding. We'll dig a little more into the weeds of this thread scheduling stuff. Stay close ...