Supercomputer Debugging: More Than Just Breakpoints
There's no question you could set a boatload of breakpoints with a 20-petaflop supercomputer, but you probably need more than that when developing parallelized software for it. You might also resort to fast conditional watchpoints, compiled expressions, asynchronous thread control, and full post-mortem debugging capabilities, and that's just for starters.
Those werejust a few of the features scientists at the Lawrence Livermore National Lab (LLNL) were looking for when supporting scalable development efforts on the Advanced Simulation and Computing (ASC) Sequoia. At 20 petaflops, Sequoia will be 34 times as powerful as LLNL's current Blue Gene/L, giving scientists a lot more computing cycles for simulations and basic science research.
"Sequoia represents a major challenge to code developers as the multi-core era demands that we effectively absorb more cores and threads per MPI task," says LLNL's Mark Seager.
The Sequoia effort includes two generations of IBM Blue Gene supercomputers that will deliver the next generation of advanced systems being developed under the ASC program. ASC is a cornerstone of the National Nuclear Security Administration's (NNSA) program to ensure the safety, security, and reliability of the nation's nuclear deterrent without underground testing. These two Blue Gene systems are Dawn, a 500-teraflop system that was accepted by LLNL in March of 2009, and Sequoia, a 20-petaflop system based on future Blue Gene technology, slated for delivery in 2011.
In this case, LLNL turned to TotalView Technologies, a developer of interactive analysis and debugging tools for serial and large scale parallel software. TotalView is a source code analysis and memory error detection tool that is designed to simplify the process of debugging parallel, data-intensive, multi-process, multi-threaded or network-distributed applications. In short, TotalView offers a number of features that make it capable of scaling to thousands of processes or threads with applications distributed over multiple machines or processors.
However, the company isn't focused solely on advanced supercomputers like Sequoia; it also provides tools like MemoryScape 3.0 that support platforms like Apple's Mac OS X Snow Leopard. MemoryScape 3.0 introduces support on Snow Leopard for malloc zones, a mechanism for controlling multiple pools of memory on Mac OS X systems. Both the allocator and owner of all heap allocations can be tracked, displayed and used for filtering. MemoryScape also provides the capability of detecting and controlling low available memory conditions in the heap.
Snow Leopard completes the transition for Mac to 64-bit, with all key system applications rewritten as 64-bit, enabling the Mac to address massive amounts of memory. Its Grand Central Dispatch handles threads for multicore processing at the operating system level, automatically distributing work to provide for optimal performance.
Parallel Pattern 5: Stencil
All memory addresses used for reads are expressed as offsets
Distributing Work Across Cores Using .NET
A roll-your-own ThreadPool implementationLooking For The Lost Packets: Part 2
Looking For The Lost Packets: Part 1
- Intel Parallel Studio; Download the free eval today!
- Parallelism Breakthrough Video Series; Watch and learn more about Intel® Parallel Studio
- 2009 Intel Software Webinar Series; View On-Demand webinars
- Coding for Multi-core Processes; Intel® Compiler Pro eBook
- Performance Through Parallelism; Intel® Tuning for Vista eBook
- Intel® Software Network; Connect with developers and Intel engineers
-
February 18, 2010
Lock Contention, Using Intel Parallel Studio to Improve Performance
Speaker: Vasanth Tovinkere, Software Engineer, Intel Corporation (Bio)Vasanth Tovinkere is a software engineer in the Developer Products Division (DPD) at Intel. His current role involves defining novel approaches to understanding and visualizing parallel performance and consulting with strategic customers to help them prepare and deliver code for the multicore world. Vasanth has been involved in the development of automatic semantic event detectors for digital sports technologies in Intel Labs. He also has been awarded three patents and has two patents pending.
Abstract:
Discover how easy it is to use the power of Microsoft Visual Studio and Intel Parallel Studio to find performance issues due to lock contention in threaded applications. This ensures that shipped applications can take better advantage of multicore processors. In this webcast, we provide live demonstrations that show how to identify lock contentions issues with Visual Studio and Intel Parallel Studio, an add-in to Visual Studio that helps developers create fast, reliable code on multicore processors.t.



