Site Archive (Complete)
DrDobbs Portal Blog: STI Cell: More Than a Game
EDITOR'S EYE

The World of Software Development.

by Jon Erickson
July 31, 2006

STI Cell: More Than a Game

Some people might call it a hack. But to me the STI Cell is just dessert. Orginally designed by Sony, Toshiba, and IBM (the "STI" in "STI Cell") as the processor for Sony's Playstation 3 game console, the STI Cell is all of a sudden on track to be a building block for next-generation high-performance systems used in computational science.

To this end, computer scientists at Berkeley Labs are benchmarking the processor's performance in running several scientific-application kernels, then comparing its performance against other processor architectures.

"Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency," report researchers Samuel Williams, Leonid Oliker, Parry Husbands, Shoaib Kamil, Katherine Yelick, and John Shalf. "We also conclude that Cell's heterogeneous multicore implementation is inherently better suited to the [high-performance computing] environment than homogeneous commodity multicore processors."

Cell is a high-performance implementation of software-controlled memory hierarchy in conjunction with the considerable floating-point resources required for demanding numerical algorithms. Cell is different form conventional multiprocessor or multicore architectures. Instead of using identical cooperating processors, it uses a conventional high-performance PowerPC core that controls eight single-instruction, multiple-data cores called "synergistic processing elements" (SPEs), each of which contains a synergistic processing unit, a local memory, and a memory-flow controller.

In addition to its departure from mainstream general-purpose processor designs, Cell is interesting because the intended game market means it will be produced at high volume, making it cost-competitive with commodity central processor units. Moreover, the pace of commodity microprocessor clock rates is slowing as chip power demands increase, and these worrisome trends have motivated the community of computational scientists to consider alternatives like STI Cell.

Berkeley Lab researchers examined the use of the STI Cell processor as a building block for future high-end parallel systems by investigating performance across several key scientific computing kernels: dense matrix multiplication, sparse matrix vector multiplication, stencil computations on regular grids, and one-dimensional and two-dimensional fast Fourier transforms. According to the research team, the current implementation of Cell is noted for its extremely high-performance, single-precision (32-bit) floating point resources. The majority of scientific applications require double precision (64 bits), however. Although Cell's peak double-precision performance is still impressive compared to its commodity peers (eight SPEs running at 3.2 gigahertz mean 14.6 billion floating-point operations per second),the group showed how a design with modest hardware changes, which they named Cell+, could improve double-precision performance.

They developed a performance model for Cell and used it to show direct comparisons of Cell against the AMD Opteron, Intel Itanium 2, and Cray X1 architectures. The performance model was then used to guide implementation development that was run on IBM's Full System Simulator, in order to provide even more accurate performance estimates.

The researchers argue that Cell's three-level memory architecture, which decouples main memory accesses from computation and is explicitly managed by the software, provides several advantages over mainstream cache-based architectures. First, performance is more predictable, because the load time from an SPE's local store is constant. Second, long block transfers from off-chip DRAM (dynamic random access memory) can achieve a much higher percentage of memory bandwidth than individual cache-line loads. Finally, for predictable memory-access patterns, communication and computation can effectively be overlapped by careful scheduling in software.

On average, Cell is eight times faster and at least eight times more power-efficient than current Opteron and Itanium processors, despite the fact that Cell's peak double-precision performance is fourteen times slower than its peak single-precision performance. If Cell were to include at least one fully usable pipelined double-precision floating-point unit, as proposed in the Cell+ implementation, these performance advantages would easily double.

Games or HPC. Everyone is having fun with this processor.

Posted by Jon Erickson at 10:18 AM  Permalink





January 2008
Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    


BLOGROLL
 

♦ sponsored
INFO-LINK


Related Sites: DotNetJunkies, SD Expo, SqlJunkies