What Challenges Does This Raise?
As discussed earlier, at its highest level of abstraction the OO paradigm does not make any assumptions about the target platform. While it is relatively straight forward to abstract operating system calls and even core-counts, it is more difficult to abstract process topology. Efficiently re-mapping an application in this way typically involves considerable low-level re-writes. Why is this?
Process topology is implicitly assumed by the choice of each component's communication paradigm. If two objects find themselves co-located in the same process, then calling each other's methods directly, or using shared memory arbitration and reference is the simplest and most efficient means of exchanging information. If, however, objects find themselves in adjacent processes that do not share memory, then data needs to be moved between them using a message-passing paradigm; typically using TCP/IP sockets or some other equivalent.
To avoid becoming locked into a particular topology, objects must use the same API regardless of whether they are co-located in the same process or separated across a process boundary.
There are many parallel APIs that assume shared memory, including:
- Intel's Threading Building Blocks
- Microsoft's Task Parallel Library
- Intel's OpenMP
There are also many parallel APIs that explicitly assume distributed memory and therefore move data, including:
- Parallel Virtual Machine (PVM)
- Intel's Message Passing Interface (MPI)
The fact that parallel application code usually assumes particular memory architecture arguably limits component re-use and portability in an even more fundamental way than choice of operating system. A compromise solution is to "always" use a message passing API (even if the components are in the same memory space). This means that topology doesn't matter quite so much because it will work in all cases. Unfortunately this imposes a performance penalty as co-located objects unnecessarily trawl large volumes of data across the bus and this can outweigh any gains made from using multi-core technology.
Another problem is that load balance can be difficult to calculate statically and is prone to change with the modification and/or addition of functionality. The fact that core counts are expected to double every 18 months for some time to come means that different customers are likely to have different platforms, comprised of machines with differing core counts.
In the worst case "turn-key" applications that assume a particular functional accretion may require different versions for different platforms. This problem is compounded by the fact that load-balance frequently depends on the data-set that is being processed and so cannot necessarily be discovered until runtime.
Perhaps the most difficult problem however, is providing a means of "describing" the application's accretion(s); and to do so in a way that doesn't impact on the application source as outlined in Parts 2 and 3 of this article series.
- Accretions must be independent of each other;
- Translation of the accretion description(s) along with the application source to produce a final executable for each identified process must be automatable;
- Accretion must be a "cheap" operation, so that it is feasible to "experiment";
- Accretion must not require any knowledge of the application other than its approximate CPU and bandwidth budgets.
How Are These Issues Addressed?
To provide an efficient solution that doesn't gratuitously move data, the Blueprint programmer is presented with a beefed-up reference view of the world. This is referred to as the "Single Virtual Process" (SVP). The developer sees all data by reference, but if two components actually find themselves located in adjacent processes at execution time, then the runtime will:
- Transparently move the referenced data
- Cache it locally
- Garbage-collect it when it is no longer referenced.
The developer also assumes an unbounded number of "logical" preemptive threads, but this is actually implemented using a minimal number of system managed worker threads to minimize resource requirements.
To the programmer, the SVP model appears like an SMP programming model operating at network scope as the data always appears to be accessible by reference. The runtime must keep track of the size of each referenced object so that it can transparently move data between disparate memory spaces and sustain this "illusion" of direct reference. This topic is beyond the scope of this introductory article but is achieved through the use of "records" which are data wrappers that allow for variable sized data.
At the top level, the application needs the concept of a set of asynchronously executing autonomous components. The asynchronous nature of these components is crucial because unless they are all co-located to the same process, they will need to execute asynchronously in the true sense (probably on different machines). For these components to execute asynchronously, global (interprocess) synchronization is required, but it has to be provided at a high enough level of abstraction to avoid making any assumptions about process and/or thread proximity.
Blueprint addresses the first issue through the concept of "circuits" (concurrently executing classes), and the second through its use of high level "event operators" (collectors, distributors, multiplexers, etc); these mechanisms are described in earlier parts of this article.
The diagram above shows a typical "circuit" definition. In this example, collectors and distributors are used to coordinate and synchronize concurrent method execution within the circuit, and the public objects (exposed by their consumer and provider pins) provide synchronization between adjacent circuit instances.
The circuit above shows the top level of a simplified military sonar system. Each "sub-circuit" executes asynchronously and synchronizes with each other "sub-circuit" through its public event operators. Objects ("connectors") must be able to locate and connect-to other objects (their "connectees") and this is achieved using the runtime's registration service. For the translator to be able to generate the necessary code to do this, objects (and their exposed "pins") need to be uniquely named so that the connectivity information provided by the circuitry can be used.
The days of the physical SMP architecture are numbered due to memory access becoming a major bottleneck as core counts rise. Heterogeneous architectures demand different methods of communication between objects depending on whether they share a common address space (where data can be referenced), or are in separate address spaces (where data must be transferred between them).
To avoid becoming locked into a particular topology it is essential that the same API is used for all communication between objects. This allows the underlying runtime to select the best location to execute the object and the most efficient means of communication to use.
This also means that because the application code makes no assumptions about topology, functionality can be partitioned between processes easily using a lightweight accretion operation. The application code doesn't change and the accretion description simply specifies which objects map to a particular process.
For More Information
- Multi-Core OO: Part 1
- Multi-Core OO: Part 2
- Multi-Core OO: Part 3
- Multi-Core OO: Part 4
- Multi-Core OO: Part 5
Parallel Recent Articles
This month's Dr. Dobb's Journal
Most Recent Premium Content
- January - Mobile Development
- February - Parallel Programming
- March - Windows Programming
- April - Programming Languages
- May - Web Development
- June - Database Development
- July - Testing
- August - Debugging and Defect Management
- September - Version Control
- October - DevOps
- November- Really Big Data
- December - Design
- January - C & C++
- February - Parallel Programming
- March - Microsoft Technologies
- April - Mobile Development
- May - Database Programming
- June - Web Development
- July - Security
- August - ALM & Development Tools
- September - Cloud & Web Development
- October - JVM Languages
- November - Testing
- December - DevOps