Massive Parallelism Has a Name ... Extreme Scale Computing
Parallel computing is not a new concept. Its been around for decades. Now the reality is here. Serial computing is dead? Well, that's what was stated in an article in IEEE Computer Magazine.I don't know if that's the case. With every new language, new operating system, new architecture, we always say whatever came before is dead. Cameron and I don't prescribe to killing off anything that has to do with application development. Use whatever approach is needed to solve the problem.
New technologies just allow for more tools to make solutions possible and/or more efficient. But as far as hardware, that is pretty much the case. Will there be CPU manufacturers making single-core CPUs? That's dead. Hardware development marches on, no looking back.
Now we're talking about millions of cores and peta-scale (1015) to exa-scale (1018) operations per second. Massive parallelism has a name -- Extreme Scale Computing (ESC). Just like multicore that had to solve the issue of power consumption and data transfers that led to improvements in data bus transfers technology for example, Extreme Scale Computing has many challenges it must overcome in the next decade: energy and power consumption, and enabling concurrency and locality. Table 1 shows some of the challenges of ESC as expressed in the article written by Josep Torrella from the University of Illinois article that appeared in IEEE's Computer Magazine -- Architectures for Extreme-Scale Computing.
ESC with a million cores will require maybe hundreds of millions of memory chips and how much secondary storage. Can the system continue to function if there are massive hardware failures? How much fault tolerance will these systems need?
And what will ESC demand from the software developer to take advantage of this hardware? As far as taking advantage of the hardware, software developers did more than just take advantage. There was a dependency. Many developers relied on the computing speed of chips in order to improve the performance of their applications. With Moore's Law stating the number of transitors on a chip doubles every 18 months to 2 years, many developers relied on that speed up. Within a few years we are looking at the end of Moore's Law. Now,as far as performance, we are looking at the demand of high levels of parallelism. On one hand we are hearing, "Don't worry about it, just keeping writing software the way you have been doing, it will be taken care of". On the other hand there are those saying it will require "heroic efforts". According to Mr. Torrella:
Programmers of extreme scale machines must be able to express a high-degree of parallelism in a way that does not preclude careful locality management and communication minimization. An appealing approach is to program the machine using a high-level programming model, and then rely on intelligent static and dynamic compilation layers to efficiently map the code to the hardware.
What does that mean? Well developers will have to develop software that has a "high-degree of parallelism" using a high-level programming model and steering away from communication and synchronization between processes and threads, not allot of data sharing. This means less dependency between threads/processes which creates the "high-level of parallelism". And then lower level software, compilers, will be responsible for determining how this code is actually mapped to the hardware while its considering different localities like clusters and stuff. Wow, this is ideal! That's good if its possible but we are suppose to be trying to produce software that solves problems people and the ideal may not be possible especially if that approach prevents us from actually solving that problem. Parallelism may not solve the problem. Yeah, I said it! What are we trying to do, keep all those cores busy? Is that our job?
Table 1: Some Challenges of Extreme Scale Computing | ||
CHALLENGE |
GOAL |
APPROACHES |
Increase Energy and Power Consumption
|
50 x 109 operations per watt consuming 20pj per operation
|
|
Design circuits for energy and power efficiency | ||
Near threshold voltage operations | ||
Nonsilicon memory | ||
Photonic interconnects | ||
Enabling concurrency and locality
|
Support more threads and high degrees of locality of reference (spatial and temporal data locality.)
|
|
Provide efficient point-to-point synchronization between cores | ||
Lower overhead for creation, migration of threads | ||
Clustering of cores | ||
Computer engines in cache for memory intensive computations | ||
Fault Resilience
|
Prevention of massive hardware failures
|
Reasonable resilience occurring on different levels of stack computation (Hardware, OS, Compiler, Application)
|