Dr. Dobb's | Multi-Core OO: Part 1

Multi-Core OO: Part 1

Problems introduced by concurrency

January 28, 2009
URL:http://www.drdobbs.com/parallel/multi-core-oo-part-1/212903146

John Gross is chief engineer and Jeremy Orme chief architect at Connective Logic Systems. They can be contacted at [email protected] and [email protected], respectively.

This five-part article introduces Blueprint, a technology developed by Connective Logic Systems (where we work) that provides an alternative approach to multi-core development. While Blueprint is interoperable with technologies such as Microsoft's Task Parallel Library (TPL) and Intel's Threaded Building Blocks, it does address some different issues.

At its highest level of abstraction, the traditional OO model is based on autonomously executing objects (actors) that communicate by invoking each others methods. This would appear to be a highly intuitive model for multi-core development. This model makes no assumptions about the target platform, its core count, network topology, or its memory distribution.

The aim of Blueprint technology is to present concurrency and parallelism to developers at this high level of abstraction, to allow familiar top-down decomposition, and then to use translators to generate the necessary harness and low-level synchronization logic. The translators output C++ or C# source, but this is not visible to the developer who can develop their algorithmic/business logic with either language.

To date, the technology has been used for a number of military and commercial projects, including the Australian Warfare Destroyer program and the UK's Surface Ship Torpedo Defense system. In the latter case, about 40% of the deliverable code was generate-able from visual code, and the remaining 60% from sequential C++ source.

The era of multi-core programming has arrived. Many influential commentators have pointed out that concurrent programming requires developers to learn new techniques, different from those adopted for standard sequential programs. In Software and the Concurrency Revolution, Herb Sutter and James Larus describe it this way:

Humans are quickly overwhelmed by concurrency and find it much more difficult to reason about concurrent than sequential code.

However, when humans perform everyday tasks like playing sports or driving cars, they demonstrate an innate ability to deal with complex concurrent situations. Even more ironically, standard object-oriented (OO) models, fundamental to modern software practice, provide an inherently concurrent view of the world that is divorced from its eventual mapping to processes, and hence CPU cores.

Conceptually at least, object-oriented models let asynchronously executing objects communicate with adjacent objects by sending messages to their member functions; and so at its highest level of abstraction object behavior is intrinsically parallel. So why isn't multi-core seen as an enabling technology that facilitates the physical implementation of such a familiar and intuitive logical abstraction?

The answer draws attention to another even more fundamental problem -- the discontinuity that exists between highly abstracted software designs (such as those provided by UML) and the software implementations that manifest programmers' interpretations of these designs. When modeling systems (formally or informally), designers are able to think at a highly intuitive level of abstraction. At this level, most developers are able to visualize concurrency with very little effort. However, implementing these models is another issue altogether.

Pre multi-core/many-core OO has thrived in a single-core, single-machine environment. In this special case, synchronous function calling can replace the asynchronous invocation idealized in the original OO model and the "stack" can take care of data lifetimes in a simple and intuitive manner; see Figure 1.

Figure 1

In the sequential world, mapping an OO model (e.g., parts of a UML design) to a single-threaded executable is a largely automatable task and therefore an ideal candidate for code generation. However, mapping a model to a multi-threaded implementation requires additional synchronization and scheduling logic that is assumed, but not prescribed, by typical models. Mapping to multiple process implementations, which involves the generation of complex messaging logic, only adds to the problem.

The end result of this is that programmers are required to interpret concurrent OO models by hand, make assumptions about the designer's intentions, and then implement significant amounts of difficult code, versions of which may or may not be in step with corresponding design versions.

The industry's response to this has been to provide engineers with new languages, and/or language extensions/libraries that aim to supply the highest levels of concurrency abstraction possible. Not surprisingly, this works fairly well for the special case of "regular" concurrency (e.g., the parallelization of for loops), but fares less well in the more general irregular cases that appear in day-to-day programming.

The reason that most "concurrency specialists" would agree with the sentiments expressed by Herb Sutter and James Larus becomes apparent with a first foray into the implementation of a physically concurrent object model (required to exploit multi-core).

The principal goal of the Blueprint development environment is to present the developer with OO's highly intuitive view of concurrently executing objects, to explicitly allow them to express their synchronization and scheduling logic with a similarly high level of abstraction, and to do so in a manner that makes no assumptions about the target platform's architecture and/or memory topology. Optimally accreting the application's functionality to one or more multi-core machines is a separate (and orthogonal) activity, which means that unless the application is intrinsically platform-locked, the "same" application code will execute across any platform without modification.

The Divide-and-Conquer Approach

As they say, there's "more than one way to skin a cat" and developing new languages that explicitly address concurrency is one way forward, but is anathema for developers that have significant investment in legacy (C++, C#, Java) code. The extension of existing languages and/or provision of libraries are alternative approaches, but since they are essentially retrospective, are likely to involve compromise somewhere along the line.

Most would probably agree that algorithmic logic is normally best considered as a set of sequential steps involving conditional logic (e.g., if-then-else). On the other hand, scheduling logic is inherently parallel and is more naturally considered in terms of branching and merging metaphors.

Scheduling and processing are clearly different, and separable concerns; and so an alternative way forward is to allow developers to specifically describe their application's concurrency in terms of its connectivity and dependency, but leave algorithmic and business logic in its current sequential form. This means that existing applications can be largely unaffected by migration to multi-core and most developers can continue to work in a familiar sequential environment using familiar tools and languages.

Equally importantly, concurrency needs to be expressed in a manner that does not make any assumptions about the target platform; number of cores, number of machines, memory distribution, and so on. This means that programmers need to be presented with a simple and intuitive "idealized platform." Mapping functionality to target hardware therefore needs to be another separate stage that should not involve or concern application developers.

Visual Concurrency

Blueprint uses a specialized visual programming paradigm to deal with concurrency aspects. This allows descriptions to include branching and merging information in a way that textual equivalents alone do not readily support. In the concurrency domain, statement "order" is replaced by "connectivity," but the algorithmic/business domain remains sequential and is decoupled from its scheduling; see Figure 2.

Figure 2

Conventional text programs derive much of their "meaning" through precise statement ordering, whereas an electronic circuit diagram derives equivalent meaning through its connectivity; this means that the eye can scan circuits in many different orders and still derive exactly the same meaning.

The obvious point here is that connectivity can branch and merge and is therefore an ideal medium for describing concurrency. It is no coincidence therefore that electronic circuitry is usually presented visually, whilst ASIC algorithmic programming (implicitly parallel) is more likely to involve textual descriptions (e.g., VHDL). So arguably, it is the nature of the logic, rather than the nature of the physical hardware, that determines the most intuitive programming approach; and the arrival of multi-core should not be allowed to drastically change the way that developers think.

It is necessary to find a way to map the traditional OO model to code; and to do this it is also necessary to abstract the platform, capture the application's scheduling constraints, and use a new generation of translators to perform the "heavy lifting" required to take OO's high-level concepts, and generate the low-level synchronization code that implements it.

The Blueprint Toolchain

The first step to providing developers with OO's intuitive and widely accepted concurrent programming abstraction is to create an "idealized" environment for concurrent applications to execute within. The second step is to provide a series of independent (orthogonal) descriptions that take the high-level OO platform independent abstraction, all the way through to the deployment of an arbitrarily runtime-scalable set of executables. This six-part series of articles will describe each of these steps, and where relevant, reference early adopter projects like the UK's Surface Ship Torpedo Defense (SSTD) system as proof of concept.

As Figure 3 illustrates, Blueprint separates the mapping of high-level program logic to physical executables into four independent stages:

Figure 3

The first stage is to develop an application for the idealized Single Virtual Process platform. In most cases it is possible to develop and debug this as a single process on a standard laptop or desktop (specialized I/O devices can be modeled using Blueprint devices). This involves two distinct components -- a textual algorithmic/business logic description, and a visual concurrency constraint description.
The second (independent) stage is to use the accretion editor to map program logic to one or more distinct "processes."
The third stage is to use the colony editor to identify those processes that are to be "slaved". The translator can then build each required process type.
Finally, the task manager is used to allocate instances of each process type to appropriate machines in the available network.

The latter three stages are relatively lightweight and do not involve modifying the application itself. There is no limit to the number of accretions, colonies, or network configurations that can be applied to a given logical Blueprint application. If the application itself is correctly written (no undetected race conditions) then each mapping will usually execute repeatedly (albeit at different speeds), allowing most debugging to be undertaken with a simple single process (and often single threaded) build.

Next Time

In the next installment of this six-part article, we examine the issues involved in separating an application's scheduling logic from its algorithmic/business logic by providing examples.

The Divide-and-Conquer Approach

Visual Concurrency

The Blueprint Toolchain

Next Time

For More Information