Energy Efficient Programming
In my most recent job I was looking at energy efficient computing, which is a big deal in mobile computing. Everyone wants their tablet or phone or notebook to run as long as possible on a single charge of the battery. Efficient use of energy is also becoming a really big deal in cloud computing with all those processors sitting in a data center (supporting mobile computing users) and technical computing centers where larger and larger clusters are assembled on the road to Exascale computations. One notable alternative energy source is the supercomputer in the Advania Thor Data Center in Iceland, which uses hydro- and geothermal energy that is relatively cheap. (I wonder if they simply open a window to cool down the data center machines.)
White PapersMore >>
As with processor speeds in the run up to multicore processors for the masses, the energy envelope of hardware is being improved (i.e., lowered) with each new generation. Recently I was made privy to the plans of a large international chip producer and found the numbers impressive as to how low it is going to be taking the power usage in its flagship processors over the next few years. From all this data and the focus on improving energy utilization in hardware, it looks like software developers are still in the "free lunch" phase with regards to energy consumption. That is, I can simply wait until the next processor generation is released, recompile my application, and get the benefit of equal or better execution performance while the power consumption of my application improves automatically.
In those halcyon days when I could spend a year at the beach instead of coding and still speed up my code execution by running my application on the faster, next-generation processor, it was still possible to make improvements to the application that would optimize execution and increase the execution speed on the currently available hardware. Now, for those of us that might be more proactive about energy efficiency (or don't have a beach close enough), I wonder if there is anything a programmer might be able to do that would directly affect the power consumption of a given piece of code. In more colloquial terms, I could pose the question: "Which is more energy efficient: a
for loop or a
I'm sure that I'm showing my ignorance of hardware and computer architecture when I say, "Yes, a programmer can affect the energy consumption of an application, but not as directly as we might hope." Allow me to qualify that before you jump all over my naiveté.
In the arena of serial programming (more on the effects of parallel execution in a few paragraphs), there might not be much programmer influence on energy usage. I am assuming the execution of one instruction requires the same amount of energy as any other instruction. From what I remember about modern computer architecture and the execution of instructions, there is a pipeline to perform all the steps required and there are fixed number of stages in the pipeline. This suggests that it doesn't matter if the instruction is a floating-point or integer operation; the energy needed to traverse the pipeline is the same. I do recall that there are more overall steps involved in handling floating-point numbers versus integers, but that just might mean some stages in the pipeline are skipped. Is there any appreciable energy savings to be had when bypassing the exponent normalization? Even if my assumption on energy consumption between two different operations is wrong, there aren't many applications that I know of where you can replace floating-point calculations with an integer operation and still yield correct results.
Looking in a positive direction, one thing that I am sure can be done by the programmer would be to reduce the total number of instructions. If you can accomplish in 100 instructions what can also be done in 200 instructions, the former execution will use less overall energy. There are certainly large changes in the number of instructions used by one algorithm over another; e.g., Quicksort over Bubblesort. Unrolling loops will perform the same number of computational steps, but reduces the number of testing and looping instructions that are executed. There might be some difference in the energy consumption between a
for loop and a
while loop if the number of instructions involved in incrementing loop index variables and testing termination conditions are significant between the two variations. If there is such a difference and such a change is feasible, is that energy savings worthwhile for all the time and effort it would take to recode from one loop format to the other?
Vector operations are another place where a programmer can reduce the consumption of energy by an application. Does it cost any more energy to execute on a single operand (in a vector register) than it would require if you load up the vector register with four operands? I would think both situations are the same, but the latter gets four times the results and is able to compute the desired answers in one-fourth of the time with only one quarter the energy consumed. Compilers can detect many instances where vector computations could be used even if the programmer didn't explicitly code for vector operations. If the compiler can't make that call due to conservative assumptions, intervention by the programmer via compiler directives or pragmas will inform the compiler that such vectorizations are safe. Adding these compiler hints will be much less time-consuming than changing loop structures. In cases where vectorization might not be viable, but there are still independent computations that can be executed concurrently, there is still a possibility for energy conservation on multicore processors.
Today's processors, as I mentioned earlier, are more energy aware. When the processor is not actively executing, it will be set into a lower powered state as it sits idle. When some execution is ready to proceed, the frequency and power are amped up and the computation is executed. For sake of example, let me assume I have a quad-core processor that runs at 5Wh (Watts per hour) per core when the core is running at full speed and 1Wh per core when idle. Running a serial computation that takes 4 hours will burn 32 Watts (20, 1 core x 5Wh x 4 hours, for the active core and 12, 3 cores x 1Wh x 4 hours, for the three idle cores). If I can parallelize that same computation across all four cores, the execution time will be only one hour and the total energy used will be only 20 Watts (4 cores x 5Wh x 1 hour). Not only is the answer computed quicker, but the parallel execution has a tangible energy savings.
I was once told that a prominent GPU producer had measured the amount of energy that every operation (computation, access to memory, moving data in from off-card, etc.) took on its products. If an addition or multiplication or a register shift take different amounts of energy, how willing would you be to find just the right mix of equivalent instructions to carry out some desired computation? Then, how much documentation do you need to provide to the programmer that needs to maintain your code so that the version you implemented — instead of the standard algorithm — is much better for everyone? There are easier ways to conserve power (outlined above) and do your little part to save the polar ice caps and the planet, which will help keep the oceanfront beaches where they are instead of outside your third-story window.