Parallel

The Future of Computing

By Max Fomitchev, July 17, 2006

Worries about runaway power consumption may replace concerns about speed on the next generation of CPUs

Maintaining Performance

Well, there are many ways to maintain performance. The first one -- exploitation of instruction level parallelism -- resulted in creation of super-scalar processors that we see today. Theoretically any modern CPU whether from Intel, AMD, IBM, or Sun can process and retire multiple instruction per cycle due to multiple parallel internal execution units. Funny enough, instruction-level parallelism does not yet allow sustained performance of substantially more that 1 instruction per cycle (IPC) on general benchmarks due to memory latency and branch misprediction penalty that stalls even the fastest CPUs more than half the time (source:Intel). Only highly-optimized tests or special-purpose code is capable of 3x to 5x performance boost warranted by multiple execution units. Practical gains due to architectural improvements of cache coherency or branch prediction amount to mere 5 percent in general. Long multi-stage execution pipelines that were developed to achieve higher clock speeds and inadequate memory performance created a situation when CPU can process data faster than the data can be supplied. So the trend for higher clock speed has already reversed in favor of shorter pipelines and better memory throughput. The best example of pipeline shortening is UltraSparc T1 processor with its six stage pipeline as opposed to 31-stage Pentium 4 models (Athlon XP has 10-stage pipeline and Intel's new "Woodcrest" server chip as only 14). Extrapolating the trend it is reasonable to expect CPU frequency to roughly remain the same while the CPU performance will increase due to pipeline shortening and emphasis on memory subsystem performance improvements.

Still, there is a hard limit for instruction-level parallelism, which makes it difficult in practice to keep individual execution units inside a CPU busy. Thus to improve CPU efficiency two alternative approaches are currently being pursued. One approach is super-threading (or Hyper-threading if we use Intel's terms), which allows CPU to process several parallel threads simultaneously switching from one thread to another when a stall occurs. UltraSparc T1 takes this approach to extreme by executing four threads on each core (with 32 threads on 8-core chip), switching threads in round-robin manner and when a stall occurs. While super-threading certainly boosts performance of multi-threaded applications speculative threading is pursued for improving performance of critical single-threaded applications. Intel is highly involved in speculative threading research and offers a Mitosis technology that with the help of compilers designates threads most suitable for speculative execution. AMD is developing similar technology, although the company is more tight-lipped about it. Still many rumors are circulating about AMD's clandestine "inverse hyper-threading" technology allegedly capable of uniting two individual CPU cores into a single CPU super-core CPU that would crunch single-threaded applications with a considerable performance boost. Yet the only piece of evidence on AMD's involvement with speculative threading that so far surfaced is infamous U.S. patent # 6,574,725 that looks like hardware support for speculative threading in the vein of to Intel's Mitosis. So with clock-speed increases effectively curbed by power consumption concerns most likely performance gains on upcoming CPUs would be due to super-threading (server chips) and speculative-threading (desktop chips).

There is another approach for boosting instruction-level parallelism, which has been pursued on and off by various commercial and government entities. I mean very-large instruction word (VLIW) or explicitly-parallel instruction set (EPIC) computing. First successful application of VLIW concept can be tracked back to early 1980s when a group of Russian engineers lead by Boris Babayan (who is now an Intel fellow) development a series of Elbrus supercomputers that were produced as a part of the anti-ballistic missile defense system deployed around Moscow. Massive performance gains warranted by proper application of VLIW concept allowed Elbrus machines to overcome manufacturing and technological limitations and beautifully serve their purpose. Remember that these were a special-purpose computers running hand-optimized code.

Previous 1 2 3 4 5 6 7 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Parallel

The Future of Computing

Maintaining Performance

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Parallel Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Parallel

The Future of Computing

Maintaining Performance

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Parallel Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content