Introducing the Bulk Multicore Architecture

Tools
  • Print
  • Email
Reducing complexity in manycore systems

University of Illinois computer science professor Josep Torrella has demonstrated that easing a programmer's burden in parallel computing does not compromise system performance or increase the complexity of hardware implementation. In "The Bulk Multicore Architecture for Improved Programmability" Communications of the ACM, Torrellas details his Bulk Multicore Architecture and calls for a change to the way in which multicore architectures are designed.

"While the computer science and engineering community has frequently focused on advancing the technology for parallel processing, this time around the stakes are truly high," says Torrellas. "There is no other obvious route to higher computing performance than through parallelism."

Torrellas calls for breakthroughs in all layers of the computing stack, including languages, programming models, compilation and runtime software, programming and debugging tools, and hardware architectures.

Torrellas designed his Bulk Multicore Architecture system specifically to address the complexity of parallel programming. He proposes using the hardware architecture to relieve programmers (and runtime systems) of the burden of managing data sharing in parallel environments, as well as providing new hardware-supported mechanisms to minimize programming errors.

The system eliminates one of the traditional tenets of processor architecture, namely the need to commit instructions in order, providing the architectural state of the processor after each instruction.

In the Bulk Multicore Architecture, the default execution mode of a processor is to commit chunks of instructions at a time. Torrellas explains, "Such a chunked mode of execution and commit is a hardware-only mechanism, invisible to the software running on the processor.' Moreover, its purpose is not to parallelize a thread, but to improve programmability and performance." This invisibility to the software removes programmer restrictions as to the choice of programming model, language, or runtime system.

Importantly, Torrellas is able to demonstrate that these programmability advantages do not come at the expense of performance. Furthermore, Torrellas explains that not only does Bulk Multicore reduce complexity of parallel programming, but that it also reduces hardware complexity in multiprocessor environments. In fact, the system requires simpler processor hardware than current machines.

The idea of making parallel computing simple is at the core of the Illinois Universal Parallel Computing Research Center's research agenda. UPCRC Illinois is a joint research effort of the Illinois department of computer science and the Coordinated Science Laboratory, with funding from corporate partners Microsoft and Intel. Torrellas and his team plan to expand their work on Bulk Multicore in several ways. The team will be examining the scalability of the chunk commit model, as well as how the model can enable efficient support for new program-development and debugging tools, aggressive autotuners and compilers, and even novel programming models.

On a related note, Torrellas, along with Brian Greskamp and Ulya R. Karpuzcu, recently won the Best Paper Award at the International Symposium on Microarchitecture (MICRO) for their work entitled "The BubbleWrap Many-Core: Popping Cores for Sequential Acceleration," which discussed promising new methods for pushing back the power wall for multicore computing architectures.

In their paper, the team proposes to push back the many-core power wall with a new scheme called Dynamic Voltage Scaling for Aging Management (DVSAM). The team's system manages processor aging to attain higher performance or lower power consumption.

To make use of this new scheme, the team developed BubbleWrap, a novel many-core architecture that identifies the most power-efficient set of cores in a variation-affected chip -- the largest set that can be simultaneously powered-on. BubbleWrap then designates those cores as Throughput cores dedicated to parallel-section execution. The rest of the cores are designated as Expendable and are dedicated to accelerating sequential sections. BubbleWrap attains maximum sequential acceleration by sacrificing Expendable cores one at a time, running them at elevated supply voltage for a significantly shorter service life each, until they completely wear-out and are discarded.

The team was also able to demonstrate significant performance increases. In simulated 32-core chips, BubbleWrap provides substantial gains over a plain chip with the same power envelope. On average, the most aggressive design runs fully-sequential applications at a 16% higher frequency, and fully parallel ones with a 30% higher throughput.

This Week's Multicore Reading List
Advanced Computer Science and Ruby and Rails
Performance Optimization for the Atom Architecture
The focus of multi-core processor tuning is on the effective use of parallelism
ParBenCCh 1.0 Parallel C++ Benchmarking Suite
Parallel Patterns in Seismic Duck: Part 3

Third International Workshop on Parallel Programming Models and Systems Software for High-End Computing

Parallel Architectures and Compilation Techniques
  • September 11-15, 2010
    The International Conference on Parallel Architectures and Compilation Techniques (PACT) is a premier international forum for the presentation of research results in parallel computing. As a multi-disciplinary conference that brings together researchers from the hardware and software areas, PACT brings together researchers and practitioners in parallel systems to present ground-breaking research related to parallel systems ranging across instruction-level parallelism, thread-level parallelism, multiprocessor parallelism and large scale systems.


IDF2010
  • September 13-15, 2010
    The Intel Developer Forum 2010 is your opportunity to collaborate with thousands of key industry players. Hear from more than 150 leading technology companies from around the world. Ask questions, get answers, experience live demonstrations, and more. Between the highly informative Keynotes, Technology and Industry Insights, Intel Fellows Live & Uncensored and Technical Sessions (including lectures, interactive panels, hands-on labs and Hot Topic Q&As), this year's IDF has everything you need to stay on top of the latest technology trends.
PPoPP
  • February 12-16, 2011
    The Symposium on Principles and Practice of Parallel Programming is a forum for leading work on all aspects of parallel programming, including foundational and theoretical aspects, techniques, tools, and practical experiences. In the context of the symposium, "parallel programming" encompasses work on concurrent and parallel systems (multicore, multithreaded, heterogeneous, clustered systems, distributed systems, and large scale machines). Given the rise of parallel architectures into the consumer market (desktops, laptops, and mobile devices), PPoPP is particularly interested in work that addresses new parallel workloads, techniques and tools that attempt to improve the productivity of parallel programming, and work towards improved synergy with such emerging architectures.