The third abstraction has to do with moving the processing to another task. Typically this is through a queue. Work queuing is used for many reasons. Two possibilities that apply to simplifying requirements for data sharing are:
- The data for the queued work can be isolated during the computation.
- The processing can be moved to an environment where data is only accessed locally and by only one task at a time.
These considerations apply to sharing of other types of resources, in addition to sharing of data.
This abstraction differs from the previous two in that it involves not only supporting library services, but also the frameworks in which applications run.
Support for Tasking Frameworks
There are several scenarios in which work might be transferred from one task to another.
- Server tasks are spawned as needed to run in parallel with a main task, which then asks for status when results are needed. For fine grained work, this can introduce considerable overhead.
- A server task is event driven, and runs continuously or waits on a queue for work.
- The client itself can be event driven and wait for responses from one or more work requests that have been farmed out.
This can be further developed by having tasks that serve as both clients to some tasks and servers to others.
Note that these services can imply a queue manager with such auxiliary functions as:
- Normalizing formats for work requests and responses. This can include marshalling and translation of request and response data elements to and from differing specific application formats and a common standard format.
- Supporting queues that can be bounded or extensible; that are first-in-first-out or by priority
- Managing a pool of server tasks and possibly the resources used by the pool members.
- Supporting multiple queues to aid in distributing work to server tasks. Allocation of work can be based on a combination of:
- Balancing resources to distribute work to different classes of server tasks
- Maximizing response time for short work elements
- Maintaining and reporting status and statistics.
- Finally, the queuing support may be part of a larger workflow framework and the queue manager can support all of, or a portion of, a workflow specification
The work queuing abstraction depends upon two other more basic abstractions; i.e., send/receive services. These services in turn require some wait/event dispatching mechanism.
- Send/receive services can be policies that are primarily application specific.
- Wait/event services can be policies that are primarily platform and environment specific.
Work queuing can occur with a variety of short and long processes, of process resource use, and of local and remote processors. A particular challenge is to find hardware and software solutions that efficiently allow queuing of very many very short processes found in typical applications. A simple but powerful example would be to split processing in typical loops, where the iterations are independent of each other, among multiple cores on a single chip, while bypassing stores into main memory.
All cooperating tasks (and a queue manager) need to agree on some standardization of Work Elements.
This can be as sophisticated as SOAP based protocols. A simple approach would provide the following minimal areas for a Work Element:
- Control to maintain priority, status, and possible timeout parameters.
- Request request type and reference to inputs.
- Response completion status and reference to outputs.
Standardizing approaches to multi-tasking generally involves significant discussion about how uncaught exceptions are handled in support tasks and propagated to initiating tasks.
Here the following principles seem applicable:
- At the application level, it is useful to separate the raising of exception conditions from the handling of them without regard to layers of intermediate function calls.
- At the system level, fundamental services need, at least as an option, to return error codes and status to callers rather than raising exceptions. This provides efficiency and allows critical application-level services to make appropriate choices for error handling, including recovery, propagation, and reporting.
Of course, for some environments, this distinction between application and system levels may not be completely intuitive and needs to be considered carefully.
Based on this:
- Servers should be relatively simple and protect themselves from uncaught exceptions. Uncaught exceptions would then be an indication of overall system failure.
- Servers should catch any exceptions from the work elements they dispatch and return status to the clients.
- Work element status should include reporting of exception conditions
- The client receiver of work responses in the requesting task can then decide to propagate an exception or simply return a status code. It can also provide immediate reporting of the condition.
The abstractions above are conceptually independent but in practice they can be used together:
- Work queuing sets up the environment and determines requirements for data sharing.
- Shared Use Pointers provide constrained access to synchronized inputs needed for Direct Synchronization.
- Exclusive Use Pointer implementation provides support that can be used for Direct Synchronization with an extended CAS.
- Direct Synchronization can include Use Pointers for initialization and installing of Exclusive Use Pointers in the atomic operations list.