Summary
Technical innovation is rapidly evolving massively parallel devices into ever more capable computational tools. While both NVIDIA GPUs and Intel MIC devices support multiple programming models, the current PCIe-based packaging imposes memory capacity, data locality and bus bandwidth limitations that strongly favors the use of these devices as external co-processors. At this time, we see a convergence of evolutionary characteristics where the prime selection criterion is performance rather than a head-on collision of technical approaches.
As in any market, price versus expected benefit will dominate procurement decisions and market success.
From a performance point of view, the KNC chip looks to be competitive with GPUs as a teraflops-capable co-processor. Pricing information is not available for NVIDIA Kepler or Intel KNC products, so it is not possible at this time to make a price vs. performance comparison. The TACC announcement shows that Intel is definitely looking at high performance computing. Meanwhile, NVIDIA has established a strong market presence and massive base of CUDA developers with products starting around the $150 - $180 price range and extending to HPC products priced in the thousands of dollars.
Clearly, benchmark results will be a hot topic once Kepler and KNC chips become available. Benchmarks will certainly be devised to exploit architectural differences between both products to accelerate some applications more than others. This will be a good thing as feedback from the forthcoming benchmark battles will doubtless spur technical innovations that will improve the performance of future GPU and MIC generations of products. As this article notes, it does not really matter if a software effort is charged as "software porting work" or "application performance tuning" as both time and money will be required to effectively use MIC and GPU devices.
The intention of all programming models is to abstract the hardware interface to preserve performance and reduce or eliminate porting costs. Eventually, software will mature to the point that performance decisions will focus more on the hardware than software. As a programmer, it is always best to look at what works best for you now and in the future. For this reason, programming languages like CUDA and OpenCL are attractive due to their use of a strong scaling execution model in combination with asynchronous queues. In other words, applications written now will be able to choreograph numerous tasks across one or more devices to scale to whatever number of concurrent threads of execution the hardware vendors can provide us.
Rob Farber is an analyst who writes frequently on High-Performance Computing hardware topics.


