Channels ▼
RSS

Parallel

Intel's 50-Core Xeon Phi: The New Era of Inexpensive Supercomputing


This week, Intel unveiled its new Xeon Phi coprocessor, which puts an astonishing 50 x86 cores onto a single PCI-connected card. The term "coprocessor" should be understood in context. Every one of the Phi's cores can boot Linux and run any x86 software. However, the card itself needs to plug into a system that has an independent CPU, which basically oversees the Phi's operations. Hence, the coprocessor appellation. The first model to be released in Q1 of next year will have 50 cores, and the follow-up coprocessor slated for release in mid-2013 will have 60 cores. Each processor supports four threads, making for 200 threads for the initial Phi. The cores run at 1.05 GHz and sport a 512-KB L2 cache each. They collectively share 8 GB of GDDR5 memory.

More Insights

White Papers

More >>

Reports

More >>

Webcasts

More >>

The aim of these processors is initially to attack tasks that are highly threadable. The Phis compete most directly with GPU processors, especially those from Nvidia. Even though they offer fewer threads than do GPUs, they deliver compelling programming advantages. If you've used CUDA or OpenCL, you know that programming GPUs is a descent into a netherworld of peculiar and rigid limitations. You're always acutely aware that you're doing something that the processor was not built to do. For example on Nvidia chips, there are multiple kinds of memory and only certain things can be done with each type of memory. Moreover, data has to be presented for calculation very carefully; otherwise, the processing lift of the GPU will disappear entirely. All of these problems go away with the Phi. It's a pure x86 programming model that everyone is used to. It's a question of reusing, rather than rewriting, code. This greater simplicity will be extremely appealing to many users who have spent long nights hacking code to get the GPUs to deliver properly. (The OpenACC initiative that we've covered several times recently is an industry effort to deal with this complexity.) The Phi can be programmed using all the typical parallel approaches: OpenMP, MPI, and Intel's own TBB and Cilk+. Intel has added some extensions to OpenMP to do the data offloading from the CPU to the Phi, but the company expects that the directives will be included in the upcoming OpenMP 4.0 spec.

The coprocessor consumes around 225 W of power, which is a surprisingly low number given the number of cores. The heat generated when the Phi is running is low enough that the device can be passively cooled. As I mentioned, the Phi comes as a PCIe 2.0 card. The PCI connection means that the data transfer process from the CPU to the GPU is a limitation (as it is on GPU computing devices) because, at full tilt, it can transfer a maximum of 16 GB/sec. (By comparison, the Phi cores access the 8 GB of internal memory at 320 GB/sec.)

Suggested reatail pricing for the initial model is $2649, with subsequent models expected to cost less than $2000. At this pricing level and with the ability to run x86 code without rewriting, the Phi most directly disrupts Nvidia's CUDA project and AMD's OpenCL work. At the moment, both Nvidia and AMD enjoy a price advantage in their GPU coprocessors, but it's not clear that the advantage is substantial enough that sites will continue preferring those solutions in light of the cost of rewriting code to run on their GPUs. Intel is leveraging its massive x86 installed base.

I expect Phis to show up initially exactly where the GPUs are mostly used today for computation: in servers used by academia, research, and high-volume data transformation. Eventually, though, I expect the coprocessors to move down to workstations and subsequently to high-end desktops.

An oft-asserted but dubious contention made in the popular press is that desktops today are so powerful that they are effectively supercomputers. Abstractly, this might be true if you compare them with their forbears of some years ago on computing power alone. However, supercomputers have (for well over a decade) been primarily highly parallel designs. Thus, the metaphor lacks a key elements it strives to express. However, with the advent of Intel's Phi coprocessor, this gap is closed and indeed we can expect to have true supercomputing power on servers and desktops soon at a price everyone can afford. As such, the Phi heralds a new era in computing.

— Andrew Binstock
Editor in Chief
alb@drdobbs.com
Twitter: platypusguy


Related Reading






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

ubm_techweb_disqus_sso_-e0fe03e57a70040ecc0219e491a462ba
2012-11-16T17:59:51

You can easily configure a Linux kernel to run in a lot less memory than that. Think single-task. Lots of HPC apps need only 50M of RAM.


Permalink
ubm_techweb_disqus_sso_-b168f3ea9a2d48402cd00dd3aca28ecb
2012-11-16T00:12:59

Something like this perhaps?

http://www.returninfinity.com/...


Permalink
ubm_techweb_disqus_sso_-03f7cfc46738ae7a778b5405aec93d12
2012-11-15T21:12:15

A quote from Wiki - http://en.wikipedia.org/wiki/X...
"This is not an issue in 64-bit programs, as all AMD64 processors have SSE and SSE2, so using SSE and SSE2 instructions instead of x87 instructions does not reduce the set of machines on which x86-64 programs can be run"

Now, Xeon Phi is a 64-bit x86 processor, but does support neither MMX nor SSE/2. I'm just saying, that even though Phi is x86, it won't run your existing x86 - it requires recompile.

Those who need Xeon Phi, have already adapted their code for GPGPU. Just read http://software.intel.com/en-u... - if you want to take advantage of Phi, you have to adapt 3/4 of GPU mindset. Parallel foreach, tiling...


Permalink
AndrewBinstock
2012-11-15T20:57:48

@JSawyer: You need to recompile only code that relies on MMX, SSE, and AVX. Even then, recompiling is hugely easier than what GPUs demand, which is a complete rewrite.


Permalink
ubm_techweb_disqus_sso_-03f7cfc46738ae7a778b5405aec93d12
2012-11-15T20:48:47
An oft-asserted but dubious contention made in the popular press is that desktops today are so powerful that they are effectively supercomputers. ... However, supercomputers have (for well over a decade) been primarily highly parallel designs. Thus, the metaphor lacks a key elements it strives to express.


I'm not sure I agree with this. They compare GPU to the supercomputer, not the CPU... Cell processor in PlayStation 3 was considered supercomputer grade for its SPEs, not for one PPE. There were some interesting supercomputing applications with PlayStation 3 in its early years.

Xeon Phi (aka Larrabee) is a failed competitor to GeForce. Now Intel sells it only as a competitor to Tesla. It lacks support for MMX, SSE and AVX, so it won't run your standard x86 code. To take advantage of this chip, you'd have to recompile your code for special 512-bit IMCI extensions (similar to AVX). It does however support x87.

Intel's QuickPath Interconnect would have been a better fit for the Phi than PCIe. It's easier to work with standard NUMA than with the GPU model, if standard x86 programming model is your main selling point. But I guess, with technologies such as C++ AMP, x86 Phi doesn't have an advantage, while ordinary GPU have the advantage of a huge installed base.


Permalink
ubm_techweb_disqus_sso_-b3b1d8eb25a487aa3e1e70a29d1a99cd
2012-11-14T10:04:39

I wonder what the bus specs are though. Add in a good bit more RAM and you can maybe get some viable OS cores and a new server farm. Realistically though I think the many-core server farm is still ARM domain, but I am surprised Intel aren't thinking about it.


Permalink
ubm_techweb_disqus_sso_-b3b1d8eb25a487aa3e1e70a29d1a99cd
2012-11-14T10:01:16

I read somewhere that the Phi might actually cost $400. Have we a confirmed retail price yet or is that your best information so far? To make this accessible to mainstream programmers (like me) the price would need to be a lot lower.

As we hit the frequency wall, concurrency is the obvious answer, but common-all-garden programmers need an entry level for experience. GPU programming looks like a fight against the machine to me, rather than good experience.

I have invested in a Parallela to start my learning. I'm surprised you didn't cover it: http://www.kickstarter.com/pro...


Permalink
ubm_techweb_disqus_sso_-d48a3cf4aa43c6be37df43796094d876
2012-11-14T06:31:16

> Moreover, data has to be presented for calculation very carefully; otherwise, the processing lift of the GPU will disappear entirely. All of these problems go away with the Phi. It's a pure x86 programming model that everyone is used to. It's a question of reusing, rather than rewriting, code.

Since the vector unit is extended to 512-bits and the serial execution is simplified, the scalar vs vector performance ratio appears to be significantly lower than on regular Xeon cores. Thus, vectorization becomes essential for great performance, not only parallelisation. This really calls for presenting the calculations carefully, much like for GPUs. There is no free lunch, only slightly cheaper ones!?


Permalink
ubm_techweb_disqus_sso_-74bc4e844ead6bc20f2b51f74f58c6e6
2012-11-13T22:42:25

I wouldn't expect that anyone would want to run Linux on those cores - 60 cores in 3120A, that leaves 102 MB of memory for each Linux kernel. You wouldn't get much done. Rather, I would expect that these cores would run pure code, no OS.


Permalink
AndrewBinstock
2012-11-13T22:22:44

Whoops! Fixed. Thanks for catching that!


Permalink
ubm_techweb_disqus_sso_-723e827eaa82313c06276229d16feac1
2012-11-13T22:10:09

I'm guessing you mean 512-KB L2 per core and not 512MB - that would be quite impressive!


Permalink

Video