This week, Intel unveiled its new Xeon Phi coprocessor, which puts an astonishing 50 x86 cores onto a single PCI-connected card. The term "coprocessor" should be understood in context. Every one of the Phi's cores can boot Linux and run any x86 software. However, the card itself needs to plug into a system that has an independent CPU, which basically oversees the Phi's operations. Hence, the coprocessor appellation. The first model to be released in Q1 of next year will have 50 cores, and the follow-up coprocessor slated for release in mid-2013 will have 60 cores. Each processor supports four threads, making for 200 threads for the initial Phi. The cores run at 1.05 GHz and sport a 512-KB L2 cache each. They collectively share 8 GB of GDDR5 memory.
White PapersMore >>
The aim of these processors is initially to attack tasks that are highly threadable. The Phis compete most directly with GPU processors, especially those from Nvidia. Even though they offer fewer threads than do GPUs, they deliver compelling programming advantages. If you've used CUDA or OpenCL, you know that programming GPUs is a descent into a netherworld of peculiar and rigid limitations. You're always acutely aware that you're doing something that the processor was not built to do. For example on Nvidia chips, there are multiple kinds of memory and only certain things can be done with each type of memory. Moreover, data has to be presented for calculation very carefully; otherwise, the processing lift of the GPU will disappear entirely. All of these problems go away with the Phi. It's a pure x86 programming model that everyone is used to. It's a question of reusing, rather than rewriting, code. This greater simplicity will be extremely appealing to many users who have spent long nights hacking code to get the GPUs to deliver properly. (The OpenACC initiative that we've covered several times recently is an industry effort to deal with this complexity.) The Phi can be programmed using all the typical parallel approaches: OpenMP, MPI, and Intel's own TBB and Cilk+. Intel has added some extensions to OpenMP to do the data offloading from the CPU to the Phi, but the company expects that the directives will be included in the upcoming OpenMP 4.0 spec.
The coprocessor consumes around 225 W of power, which is a surprisingly low number given the number of cores. The heat generated when the Phi is running is low enough that the device can be passively cooled. As I mentioned, the Phi comes as a PCIe 2.0 card. The PCI connection means that the data transfer process from the CPU to the GPU is a limitation (as it is on GPU computing devices) because, at full tilt, it can transfer a maximum of 16 GB/sec. (By comparison, the Phi cores access the 8 GB of internal memory at 320 GB/sec.)
Suggested reatail pricing for the initial model is $2649, with subsequent models expected to cost less than $2000. At this pricing level and with the ability to run x86 code without rewriting, the Phi most directly disrupts Nvidia's CUDA project and AMD's OpenCL work. At the moment, both Nvidia and AMD enjoy a price advantage in their GPU coprocessors, but it's not clear that the advantage is substantial enough that sites will continue preferring those solutions in light of the cost of rewriting code to run on their GPUs. Intel is leveraging its massive x86 installed base.
I expect Phis to show up initially exactly where the GPUs are mostly used today for computation: in servers used by academia, research, and high-volume data transformation. Eventually, though, I expect the coprocessors to move down to workstations and subsequently to high-end desktops.
An oft-asserted but dubious contention made in the popular press is that desktops today are so powerful that they are effectively supercomputers. Abstractly, this might be true if you compare them with their forbears of some years ago on computing power alone. However, supercomputers have (for well over a decade) been primarily highly parallel designs. Thus, the metaphor lacks a key elements it strives to express. However, with the advent of Intel's Phi coprocessor, this gap is closed and indeed we can expect to have true supercomputing power on servers and desktops soon at a price everyone can afford. As such, the Phi heralds a new era in computing.