Supercomputer Debugging: More Than Just Breakpoints
There's no question you could set a boatload of breakpoints with a 20-petaflop supercomputer, but you probably need more than that when developing parallelized software for it. You might also resort to fast conditional watchpoints, compiled expressions, asynchronous thread control, and full post-mortem debugging capabilities, and that's just for starters.
Those werejust a few of the features scientists at the Lawrence Livermore National Lab (LLNL) were looking for when supporting scalable development efforts on the Advanced Simulation and Computing (ASC) Sequoia. At 20 petaflops, Sequoia will be 34 times as powerful as LLNL's current Blue Gene/L, giving scientists a lot more computing cycles for simulations and basic science research.
"Sequoia represents a major challenge to code developers as the multi-core era demands that we effectively absorb more cores and threads per MPI task," says LLNL's Mark Seager.
The Sequoia effort includes two generations of IBM Blue Gene supercomputers that will deliver the next generation of advanced systems being developed under the ASC program. ASC is a cornerstone of the National Nuclear Security Administration's (NNSA) program to ensure the safety, security, and reliability of the nation's nuclear deterrent without underground testing. These two Blue Gene systems are Dawn, a 500-teraflop system that was accepted by LLNL in March of 2009, and Sequoia, a 20-petaflop system based on future Blue Gene technology, slated for delivery in 2011.
In this case, LLNL turned to TotalView Technologies, a developer of interactive analysis and debugging tools for serial and large scale parallel software. TotalView is a source code analysis and memory error detection tool that is designed to simplify the process of debugging parallel, data-intensive, multi-process, multi-threaded or network-distributed applications. In short, TotalView offers a number of features that make it capable of scaling to thousands of processes or threads with applications distributed over multiple machines or processors.
However, the company isn't focused solely on advanced supercomputers like Sequoia; it also provides tools like MemoryScape 3.0 that support platforms like Apple's Mac OS X Snow Leopard. MemoryScape 3.0 introduces support on Snow Leopard for malloc zones, a mechanism for controlling multiple pools of memory on Mac OS X systems. Both the allocator and owner of all heap allocations can be tracked, displayed and used for filtering. MemoryScape also provides the capability of detecting and controlling low available memory conditions in the heap.
Snow Leopard completes the transition for Mac to 64-bit, with all key system applications rewritten as 64-bit, enabling the Mac to address massive amounts of memory. Its Grand Central Dispatch handles threads for multicore processing at the operating system level, automatically distributing work to provide for optimal performance.
Use Non-blocking Locks When Possible
Non-blocking system calls let competing threads return and useful work to be done
Automatic Parallelization
Multithreading an application to improve performance can be a time-consuming activityDesigning Parallel Algorithms: Part 4
Combining TBB and IPP with Intel Parallel Studio
- Intel Parallel Studio; Download the free eval today!
- Parallelism Breakthrough Video Series; Watch and learn more about Intel® Parallel Studio
- 2009 Intel Software Webinar Series; View On-Demand webinars
- Coding for Multi-core Processes; Intel® Compiler Pro eBook
- Performance Through Parallelism; Intel® Tuning for Vista eBook
- Intel® Software Network; Connect with developers and Intel engineers
-
December 15, 2009
How to Use Intel® Parallel Studio to Streamline Code Development in a Multicore Environment
Speaker: Matt Dunbar, Director for Performance Technology, SIMULIA (Bio)Matt Dunbar is the director for performance technology at SIMULIA. Since joining the company in 1993, he has worked on parallelization of the Abaqus suite of products, initially for shared memory architectures and more recently for distributed memory architectures. Dunbar has also been intimately involved in selecting both the hardware and software tools used in the development of the Abaqus product line.
Abstract:
Resolve elusive, costly multithreading errors quickly and efficiently with Intel® Parallel Studio. While many coding problems that lead to bugs in software applications are typically straightforward logic errors, errors in managing memory and in multithreading code can sometimes take weeks to months to diagnose and fix. Matt Dunbar explores how and why taking advantage of multicore processors through multithreaded code is critical for compute-intensive applications. While spotlighting his work on SIMULIA's Abaqus finite element solver, Dunbar addresses the need for multicore execution and shares his experiences using Intel Parallel Studio to streamline code development in a multicore environment.


