Summary and Conclusions
Mainstream hardware is becoming permanently parallel, heterogeneous, and distributed. These changes are permanent, and so will permanently affect the way we have to write performance-intensive code on mainstream architectures.
The good news is that Moore's "local scale-in" transistor mine isn't empty yet. It appears the transistor bonanza will continue for about another decade, give or take a half decade or so, which should be long enough to exploit the lower-cost side of the Law to get us to parity between desktops and pocket tablets. The bad news is that we can clearly observe the diminishing returns as the transistors are decreasingly exploitable with each new generation of processors, software developers have to work harder and the chips get more difficult to power. And with each new crank of the diminishing-returns wheel, there's less time for hardware and software designers to come up with ways to overcome the next hurdle; the motherlode free lunch lasted 30 years, but the homogeneous multicore era lasted only about six years, and we are now already overlapping the next two eras of hetero-core and cloud-core.
But all is well: When your mine is getting empty, you don't panic, you just open a new mine at a new motherlode, operate both mines for a while, then continue to profit from the new mine long-term even after the first one finally shuts down and gets converted into a museum. As usual, in this case the end of one dominant wave overlaps with the beginning of the next, and we are now early in the period of overlap where we are standing with a foot in each wave, a crew in each of Moore's mine and the cloud mine. Perhaps the best news of all is that the cloud wave is already scaling enormously quickly faster than the Moore's Law wave that it complements, and that it will outlive and replace.
If you haven't done so already, now is the time to take a hard look at the design of your applications, determine what existing features or better still, what potential and currently unimaginable demanding new features are CPU-sensitive now or are likely to become so soon, and identify how those places could benefit from local and distributed parallelism. Now is also the time for you and your team to grok the requirements, pitfalls, styles, and idioms of hetero-parallel (e.g., GPGPU) and cloud programming (e.g., Amazon Web Services, Microsoft Azure, Google App Engine).
To continue enjoying the free lunch of shipping an application that runs well on today's hardware and will just naturally run faster or better on tomorrow's hardware, you need to write an app with lots of latent parallelism expressed in a form that can be spread across a machine with a variable number of cores of different kinds local and distributed cores, and big/small/specialized cores. The throughput gains now cost extra extra development effort, extra code complexity, and extra testing effort. The good news is that for many classes of applications the extra effort will be worthwhile, because concurrency will let them fully exploit the exponential gains in compute throughput that will continue to grow strong and fast long after Moore's Law has gone into its sunny retirement, as we continue to mine the cloud for the rest of our careers.
I would like to particularly thank Jeffrey Barr, David Callahan, Olivier Giroux, Yossi Levanoni, Henry Moreton, and James Reinders, who graciously made themselves available to answer questions to provide background information, and who shared their feedback on appropriately mapping their companies' products on the processor/memory chart.
Herb Sutter is a bestselling author and consultant on software development topics, and a software architect at Microsoft. A version of this article is posted on his website.