In September 2011, the Texas Advanced Computer Center (TACC) announced Stampede, a new 10-petaflops-capable (1016 or 10,000 trillion floating-point operations per sec.) supercomputer based on the Intel MIC (Many-Integrated Core) architecture. The Stampede announcement demonstrates a substantial and long-term commitment by Intel to deliver massively parallel many-core hardware to the high-performance computing (HPC) market by January 2013. The heart of the Stampede system will be the 50+ core Knights Corner (KNC) processor chips packaged in a PCIe form factor (the same form factor used by GPU computing co-processors). More than 8 of the 10 petaflop/sec. of peak floating-point performance will be provided by the Knights Corner PCIe co-processors.
The entrance of an x86-based many-core design into the HPC leadership class marketplace raises a key question: Will the Intel Knights Corner chips compete as co-processors that accelerate application performance like GPUs do, or will they provide a "compile and run" alternative where the MIC device behaves like a stand-alone many-core Linux system?
The fact that Intel has now made substantial commitment to teraflops-capable, massively-parallel hardware devices comes as no surprise. Many in the computer industry, including me, have observed that CPUs and GPUs are following convergent evolutionary paths. As I note in my Scientific Computing article, "HPC's Future", the failure of Dennard's scaling laws forced chip manufacturers to switch to parallelism to increase processor performance. Due to power and heat issues, many-core processors have become a necessity as it is no longer possible to significantly increase the performance of a single processing core.
This new era of multi- and many-core computing has been disruptive to the software industry as it requires that existing applications be redesigned to exploit parallelism (rather than clock speed) to achieve high application performance on this new parallel hardware. During this transition to massively parallel programming, the owners of legacy code bases are faced with some difficult choices because there are no generic "recompile and run" solutions. As I noted in my Scientific Computing article, "Redefining What is Possible":
Legacy applications and research efforts that do not invest in multi-threaded software will not benefit from modern multi-core processors, because single-threaded and poorly scaling software will not be able to utilize extra processor cores. As a result, computational performance will plateau at or near current levels, placing the projects that depend on these legacy applications at risk of both stagnation and loss of competitiveness.
Vendors of parallel-processing hardware are making significant investments to ease the cost of transitioning legacy software to massively parallel computing. The challenge for legacy software owners lies in understanding how well these vendor efforts translate to production application performance.
While still at an early pre-hardware release stage, it is possible to draw some preliminary conclusions based on an analysis of the MIC and GPU architectures and the currently available information about the NVIDIA Kepler and Intel Knights Corner chips. In this article, I consider these two processors based on established high-level comparative measures such as memory capacity, balance ratios, and Amdahl's Law in the context of four programming paradigms:


