*Anthony is founder and managing director of Just Software Solutions. He is also maintainer of the Boost thread library and a member of the BSI C++ Standards Panel. He can be contacted at anthony@justsoftwaresolutions.co.uk.*

Recently, I was working on an application that helps helicopter maintenance engineers adjust their helicopters to fly better. This application analyzes data collected during flight, and provides engineers with information to help them adjust the helicopter. This is a very computationally expensive operation, with extensive use of trigonometric and exponential functions, so though the initial prototypes ran very fast on a PC, they were very slow on the target platform—a handheld device running Windows CE on an ARM processor without hardware floating-point support. After several rounds of optimization focused on improving the algorithm, reducing the use of expensive operations, adjusting the memory-allocation patterns to avoid allocating, freeing memory in a loop, and so forth, the application was running considerably faster than the initial prototype. Unfortunately, it was still too slow, so it was time to pull out the "big guns" and use fixed-point math. This final push doubled the speed of the application, finally bringing the performance within acceptable limits. But it wasn't easy. In this article, I explain the fixed-point techniques we used to achieve this performance gain.

### Fixed-Point Math

I had put off using fixed-point math because the application made extensive use of sines, cosines, and exponential functions that are nontrivial to implement. However, profiling (see sidebar) showed that the mathematical operations were consuming a very high percentage of CPU time, and I had exhausted all other avenues. This was relatively unsurprising, given that the target CPU didn't support floating-point operations in hardware, so all mathematical operations were performed by an emulation library.

Before proceeding, it was important to check the range of possible values—the range for a *double* is far larger than can be represented in a simple fixed-point type. For this application, 64-bit fixed-point values (35 bits to the left of the point, 28 bits after, and one sign bit) were sufficient as a replacement.

Initially, I hunted the Internet for a free, open-source implementation of fixed-point math. The only implementation I found that provided all the required mathematical operations did speed up the application, as simple operations such as addition and subtraction were considerably improved. Unfortunately, the overall gains weren't as good as had been hoped for. However, seeing the naïve implementations of the more complex functions (calculating sines and cosines from the Taylor series, for example) gave me hope—by improving the implementation of these functions, there was potential for much greater improvements, so I pressed on with renewed vigor. Eventually, the whole fixed-point library was rewritten from scratch, but the final performance gain was enough to justify the time spent in doing so, and it was easier than anticipated. (The complete source code for the fixed-point library and profiling code is available online at www.ddj.com/code/.)

### Don't Forget the Constants!

Having changed over to fixed point, the next easy gain came from replacing floating-point and integer constants involved in fixed-point calculations with fixed-point ones. It's easy just to use M_PI or 0 or 1 or even 0.5 in a calculation, and this is not so easy to spot when changing every use of "double" to "fixed." The profiler showed that these constants were responsible for quite a few calls to the converting constructor, so replacing them with fixed-point constants shaved another couple of percents off the runtime.