Managing Multi-Core Projects, Part 4: The Enterprise Development Cycle Meets Multi-Core

How does multi-core change the development cycle for enterprise software?


December 16, 2008
URL:http://www.drdobbs.com/managing-multi-core-projects-part-4-the/212500793


Every successful company, maybe even every successful team, has a development process that works--what team members think of as "their" process. Making some adjustments to the process to accommodate the development of parallel software can have a big impact, keeping the process working as you transition to multi-core with the same high level of productivity you enjoy today.

Intel's Developer Products Division (DPD) has a process, too, one tuned for the development of parallel software and honed by over ten years of experience working with customers, primarily in HPC domains. The DPD process works for multi-core development just as it has worked for other parallel processing environments. It may be helpful to look at its approach, not as a model, but as a sample, something to crib from as you consider how multi-core may change your own development practices.

Intel DPD breaks its process down into four steps. These four steps are intended to be applied continuously, version over version, improving parallelism with continued revision.

These four steps are general, and can be related to any development methodology. Within an agile development framework, as an example, the process can be followed through in a single development iteration, restricted only to the sections of the software actively under development, and completed within a release cycle. More generally, if you have a refactoring process, think about adding this threading process as a special case.

Parallel in Every Phase

Developing for multi-core changes the structure of the process. It also means some new considerations for each step along the way. What follows are some pointers to follow (and pitfalls to avoid) in directing a multi-core development project. Let's consider the project step-by-step.

In the discovery step, consider tools early. Look for the threading and messaging tools that best suit your application. Tools are a key element of success in developing parallel applications, and they aren't all alike, so put in the effort up front. Find the right analysis and debugging tools as well as the right threading or messaging libraries. Test components and libraries for thread safety, relying as little as possible on vendor claims.

Threading is a design consideration, not an optimization. Include discussions of threading and thread coordination as you work through the project design. This will help to minimize threading conflicts as you get to implementation. If you can use it, data decomposition should be preferred over strictly functional threading, as data decomposition will scale better to more cores.

Train application experts in threading and parallel techniques. These are the developers that will do most of the implementation in the expression step. If you're going back and threading an existing program, use the original developer of a module to thread his or her own code, rather than a parallel-programming expert.

The confidence step in a threaded project introduces thread-specific testing requirements. In addition to testing additional thread-interaction scenarios, test the threaded version of the application for consistency with a single-threaded implementation, if one exists. This becomes more important as more users deploy on multi-core systems.

Optimization is critical for best parallel performance. Don't expect that just implementing threads will create a dramatic performance improvement. Spend the time on the back end to get the most out of the design by tuning locking, shared memory, cache interactions, and other performance parameters. Automated tools such as Intel's VTune and Thread Profiler can help a great deal in this part of the process.

Creating a fast threaded application or threading existing code takes attention to parallelism in every phase. If you skip the front end of the process, you'll end up with threading added as an optimization, less comprehensive and more likely to introduce bugs. If you skip the back end of the process, you end up with an under-tested, under-performing implementation. View each round of parallel performance enhancement as a continuous task that runs through a release cycle, rather than a feature that can be introduced late or back-burnered under schedule pressure.

In the Field: Commercial Software

As we did with HPC earlier in this series, let's relate some of these points to the development of a few actual business software products. We'll discuss two projects, one a revision to a voice communication and collaboration application that is now complete, and the other the ongoing threading of a large desktop productivity application. Intel engineers played, or are playing, a consulting role on these development efforts and they provided us with these brief case studies.

The voice application had an unusual goal. Instead of using multiple cores to better performance, developers sought to distribute the workload among cores as evenly as possible. With more even core utilization, CPU clock frequency could be dropped, conserving power.

The project team included two company engineers and one Intel engineer. At the start of the project, the application was functionally threaded only, and core utilization was uneven. The discovery step began with a review of the entire system, from the driver level up, to find opportunities for parallelism. Engineers settled on applying data decomposition techniques to the software's audio codec as the best approach. In the design, pooled threads both encode and decode audio data. The number of threads in the pool is proportional to the number of cores, for even loading. The main thread pulls threads from the pool and uses them in the codec as needed, returning them to the pool when they complete their task. Each thread runs at the same priority, again for most even distribution of active threads among cores.

As the expression step began and developers began to thread the codec, they found that a third-party library they had been using was not thread-safe. Rather than replace the entire library, which is used throughout the application, developers chose to build thread-safe replacements for only the library functions that were used in the threaded sections. That complete, developers found only a few synchronization bugs as they began to test for confidence.

Although the voice project was focused on core utilization and not application performance, developers proceeded to an optimization step after verifying that the threaded codec was working properly. They focused particularly on serial optimization within the threaded section, since that would reduce maximum core utilization. The team used Intel's VTune and Thread Profiler tools in tuning.

The second project -- at a very different stage -- is the threading of a large desktop productivity application. In this project, still in the discovery step, engineers are unraveling an inefficient threaded implementation before they can begin to look for better, more natural opportunities for both functional and data parallelism. This amounts to forensic work on the application, using VTune and a debugger to find dependencies and to map out the work done by the main thread.

The desktop application has close to 40 threads, but the main thread is doing over 95% of the work. Engineers need to first address some basic performance issues, such as reducing the time spent in system calls, before moving on to discover new opportunities for threading. The initial threading plan is to decompose the main thread, almost starting from the same point as one might with a single-threaded application.

It's hard to say how the desktop application project will proceed, but the rough schedule is for the three engineers working on the project to spend three months in further discovery (while simultaneously working on basic performance) and then the following three months on the threaded implementation.

The Development Manager's Role

In their 1937 Papers on the Science of Administration, Gulick and Urwick described a manager's role in terms of seven activities. Their's is a seminal work, one that helped to define the discipline of management. In this series, we've focused quite a bit on how multi-core changes development process and development practice. To wrap up this installment, let's take a look at multi-core from another perspective, focusing instead on how the development of parallel software changes the manager's job. We'll break it down according to the seven functions provided by Gulick and Urwick.

Planning. In planning a project, consider how parallelism will change your development process. Take an integrated approach to parallel programming that runs through every phase of development.

Multi-core is the new mainstream for more than business customers—it's time to start putting those cores to good use in consumer software, too. In our next installment, we'll look at the management issues around the development of multi-core games and other consumer software projects.


Steve Apiki is senior developer at Appropriate Solutions, Inc., a Peterborough, NH consulting firm that builds server-based software solutions for a wide variety of platforms using an equally wide variety of tools. Steve has been writing about software and technology for over 15 years.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.