Steve Apiki is senior developer at Appropriate Solutions, a consulting firm that builds server-based software solutions.
As multicore becomes commonplace, IT organizations are faced with the task of getting the most bang from their buck. Particularly for applications developed in-house, the decision to reprogram for parallel execution goes well beyond using some new programming techniques. Going parallel requires revising tried-and-true processes and rethinking how project teams are built.
For those involved in staffing and assembling project teams, managing multicore development may involve finding and allocating people with thread programming expertise, and they're hard to find. Along with that, deploying highly threaded applications introduces systems concerns. All of these problems can be handled, but because the solutions may require crossing departmental lines, they'll likely require high-level management involvement.
The most fundamental choice in creating multicore-aware applications involves parallel abstraction. Parallelism may come from a set of loosely coupled services, or it might come from threads in a single-process system. In between these extremes, other parallel models may apply. Although the choice of parallel abstraction is a software design decision, it becomes a strategic business consideration because everything from staffing to server configuration rides on where the line is drawn.
Low-level threading is hard -- really hard -- to get right, which is why it makes sense to consider higher-level abstractions. Moreover, some parallel models already have proved successful in mainstream applications -- Web services communicating through SOAP or REST, cooperating apps such as application and database servers running on multiple virtual machines, multiple independent processes running on a single machine and communicating through pipes.
While each of these approaches sidesteps the complexity of threading, the trade-off is scalability. The most scalable designs are those that apply threading to data parallelism. Task-parallel applications scale less well, and there's more overhead in applications composed of processes than in those built from lightweight threads. In general, the finer-grained the abstraction, the more scalable the application, and the trickier the coding.
Large projects probably will include several modes of parallelism. In service-based applications, for example, selected services may be internally threaded. These may be services with high compute loads, or that include some natural data parallelism.
Managing Threading Expertise
Building a team to do threaded software development poses a unique challenge for development managers. First, threading expertise is hard to come by. Second, the people with domain or legacy application knowledge are unlikely to also have deep threading experience. Building a successful team means finding or cultivating the right mix of expertise.
Teams should include someone strong in parallel algorithm design. Parallelism should be thought of as a structural aspect of application design, so it's important to have strong parallel development skills in an architectural role.
Along with the architect, you need application experts who can learn threading techniques. Testers also should be trained for multithreading. In general, the team should be structured with strong parallel skills in a lead role, but with application and domain experience counting for more among other team members. A reasonable goal is a team with 80% of its expertise in the application domain and 20% in threading and parallelism.
In migrating legacy code, a developer familiar with the project -- ideally, the developer who originally wrote the code -- should make the changes to introduce threading. Threading experts can provide guidance, but code familiarity is more important when determining how a module should be restructured.
All teams, especially those in which threading experience is light, should consider threading libraries for development. Threading libraries, such as Intel's Threading Building Blocks or OpenMP, can ease the transition by delivering thread-level scalability and performance while providing developers with a more abstract programming model. And libraries aren't just a transitional tool; by insulating developers from thread scheduling considerations and the details of working with individual cores, they also provide a way to move up to processors with more cores without changes to the application.
Similarly, teams can take advantage of thread pool mechanisms built into development environments. The .Net CLR and Java 5 both offer thread pool abstractions that can provide significant performance improvement.
Changes To Development Cycle
Moving to multicore means adjustments to every phase of the development cycle, from project planning to application tuning. Intel, for instance, suggests a four-step general process for parallel development: discover, expression, confidence, and optimization.
In the discover step, parallelism specialists try to find the natural parallelism in the application, including determining the right level of abstraction. Other tasks in the discovery step include considering refactoring to exploit parallelism and considering tools that may be used for development.
Next, expression involves the parallelism architect and app designers developing a design that uses the parallel modes discovered in the prior step. It's important to work through thread coordination and conflicts in design. This phase includes actual implementation.
The confidence step includes testing the application. Errors mean going back to discovery or expression. Multithreaded development introduces testing requirements for correctness beyond the requirements of single-threaded implementations. For legacy conversions, the multi-threaded version also must be tested for consistency with the single-threaded version.
Optimization occurs after the application has met functionality requirements. Here, significant gains can be had by tuning locking, cache interactions, and the like. Low-level thread expertise is critical here.
The development cycle for parallel applications is iterative, and teams may need to return more than once to earlier steps until they hit correctness and performance targets. Performance limitations found during optimization may require returning to the expression step to revise the implementation.
The takeaway here is that parallelism must be considered in each stage of the development cycle. Adding parallelism too late in the process means a bolted-on threading implementation that usually fails to exploit all of the parallelism in the application. On the other hand, a design left untuned leaves performance on the table.
Deploying several heavily threaded applications on a single server introduces its own set of problems. As more applications get threaded, thread scheduling becomes increasingly important to overall system performance. Max Domeika, a senior staff software engineer at Intel, says thread scheduling should be considered outside of the scope of most application development projects. He advises developers to focus on parallelism within the application, relying on the operating system or libraries to handle thread scheduling. For a large system with several processes and threads, the system architect and performance engineer should understand the impact of the cooperative processes on the shared hardware. "They will need to verify that their performance constraints can be met," says Domeika.
Multicore processors promise reduced power consumption in the data center as well as scalable performance for parallel applications. Multicore servers can lower power and cooling costs relative to single-core servers by sharing supporting hardware among cores and by increasing the amount of computing power available at a given clock speed. But here, too, there's a system design consideration that must reconcile the requirements of the data center manager with those of development.
Maximizing performance per watt is a data center goal; minimizing system response time is a development goal. Negotiating between the two means balancing system throughput for optimal power performance and system latency for shortest response time. High-level management must play a hand in managing these goals.
For management, the move from single core to multicore is a deep technological change that requires building effective development teams. Another challenge lies in reaching across IT groups to develop an integrated approach to parallel software development and deployment. This requires coordinating and involving development, systems, and facilities teams in appropriate roles.