## Elementary functions for embedded systems

*
Michael is a senior electrical engineer at Basler Electric and part-time lecturer in the Department of Electrical and Computer Engineering at Southern Illinois University at Edwardsville. He can be contacted at [email protected] or [email protected] ee.siue.edu.*

Many times in designing software for microcontroller-based systems, you need to make calculations that involve elementary functions such as *sin(x)*, *cos(x)*, or *log*_{10}(*x*). For example, many temperature sensors are logarithmic in nature; that is, the sensor output voltage may increase by *x* volts each time the temperature doubles. In this case converting the sensor voltage to a linear temperature scale requires the calculation of 2^{x}.

Calculation of an elementary function is often times done by using a look-up table. While look-up tables are by far the fastest way to make the computation, the precision of the result is directly related to the size of the look-up table. High-precision look-up tables require large amounts of nonvolatile memory to store the table. If the table size is reduced to save memory, precision is also reduced.

Power series may also be used to calculate these same functions without using look-up tables, however, these calculations have the disadvantage of being slow to converge to a desired precision. In effect, the look-up table size is being traded at the expense of computation time.

CORDIC methods of computation represent a compromise between these two methods. The CORDIC technique uses a one-bit-at-a-time approach to make computations to an arbitrary precision. In the process, relatively small look-up tables are used for constants necessary for the algorithm. Typically, these tables require only one to two entries per bit of precision. CORDIC algorithms also use only right shifts and additions, minimizing the computation time.

### Fundamentals of CORDIC Algorithms

All CORDIC algorithms are based on the fact that any number may be represented by an appropriate alternating series. For example, an approximate value for *e* can be represented as: *e*3-0.3+0.02-0.002+ 0.0003=2.7183. In this case, each digit gives an additional power of 10 resolution to the approximation of the value for *e*. Also, if the series is truncated to a certain number of terms, the resulting value will be the same as the value obtained by rounding the true value of *e* to that number of digits. In general, the series obtained for a value by this method does not always alternate regularly. The series for p, for example, is *p*3+0.1+0.04+0.002- 0.0004-0.00001=3.14159. The series for *e* is also irregular if the expansion is continued for a few additional terms.

The CORDIC technique uses a similar method of computation. A value to be computed, such as *sin(x) *or *log*_{10}(*x*), is considered to be a truncated series in the format in Figure 1. In this case, the values for *a _{i}* are either 0 or 1 and represent bits in the binary representation of

*z*. The value for

*z*is determined 1 bit at a time by looking at the previously calculated value for

*z*, which is correct to

*i*-1 bits. If this estimate of

*z*is too low, you correct the current estimate by adding a correction factor, obtained from a look-up table, to the current value of

*z*. If the current estimate of

*z*is too high, you subtract a correction factor, also from the look-up table. Depending on whether you add or subtract from the current value of

*z*, the

*i*

^{th}bit is set to the correct value of 0 or 1. The less significant bits from

*i*+1 to

*B*may change during this process because the estimate for

*z*is only accurate to

*i*bits.

Because of the trigonometric relationship between the *sin(x) *and *cos(x) *functions, it is often possible to calculate both of these values simultaneously. If the *cos(x) *is considered as a projection onto the *x*-axis and *sin(x) *as a projection onto the *y*-axis, the iteration process amounts to the rotation of an initial vector. It is from this vector rotation that the CORDIC algorithm derives its name -- COordinate Rotation DIgital Computer.

### Algorithms for Multiplication and Division

A CORDIC algorithm for multiplication can be derived using a series representation for *x*, as in Figure 2. From this, *z* is composed of shifted versions of *y*. The unknown value for *z*, may be found by driving *x* to zero 1 bit at a time. If the *i*^{th} bit of *x* is nonzero, *y _{i}* is right shifted by

*i*bits and added to the current value of

*z*. The

*i*

^{th}bit is then removed from

*x*by subtracting 2

^{-i}from

*x*. If

*x*is negative, the

*i*

^{th}bit in the twos complement format would be removed by adding 2

^{-i}. In either case, when

*x*has been driven to zero all bits have been examined and

*z*contains the signed product of

*x*and

*y*correct to

*B*bits.

This algorithm is similar to the standard shift and add multiplication algorithm except for two important features.

- Arithmetic right shifts are used instead of left shifts, allowing signed numbers to be used.
- Computing the product to
*B*bits with the CORDIC algorithm is equivalent to rounding the result of the standard algorithm to the most significant*B*bits.

Listing One is the final multiplication algorithm. This calculation assumes that both *x* and *y* are fractional within the domain-1 to 1. The algorithm is valid for other domains as long as the decimal point is allowed to float. With a few extensions, this algorithm would work well with floating-point data.

A CORDIC division algorithm is based on rewriting the equation *z=x/y* into the form *x-y*z=*0. If *z* is expanded into its series representation, the second version of the equation takes the form in Figure 3(a), which, after some manipulation, yields Figure 3(b). This final form of the equation shows that the quotient *z* may be estimated 1 bit at a time by driving *x* to zero using right-shifted versions of *y*. If the current residual is positive, the *i*^{th} bit in *z* is set. Likewise, if the residual is negative the *i*^{th} bit in *z* is cleared; see Listing Two.

The convergence of this division algorithm is trickier than the multiplication algorithm. While *x* may be either positive or negative, the value for *y* is assumed to be positive. As a result, the division algorithm is only valid in two quadrants. Also, if the initial value for *y* is less than the initial value for *x* it will be impossible to drive the residual to zero. This means that the initial *y* value must always be greater than *x*, resulting in a range of 0*<z<*1. The algorithm may be modified as in Listing Three for four quadrant division with *-*1*<z<*1.

As with all division algorithms, the case where *y* is zero should be trapped as an exception. Once again, a few extensions would allow this algorithm to work well with floating-point data.

### Algorithms for *Log*_{10}*(x)* and 10^{x}

To calculate the base 10 logarithm of a value *x*, it is convenient to use the identity in Figure 4(a). If the *b _{i}* are chosen such that

*x*b*1

**b*2

**b*3

*...*bB=*1, the left side reduces to

*log*

_{10}(1), which is 0. With these choices for

*bi*, you are left with the equation in Figure 4(b) for

*log*

_{10}

*(x)*. Since quantities for

*log*

_{10}

*(b*may be stored in a look-up table, the base 10 logarithm of

_{i})*x*may be calculated by summing selected entries from the table.

The trick now is to choose the correct *b _{i}* such that you drive the product of

*x*and all of the

*b*to 1. This may be accomplished by examining the current product. If the current product is less than 1, you choose coefficient

_{i}*b*such that

_{i}*b*is greater than 1. On the other hand, if the current product is greater than 1 the coefficient should be chosen such that its value is less than 1. An additional constraint is that the

_{i}*b*should be chosen such that multiplication by any of the

_{i}*b*is accomplished by a shift and add operation. Two coefficients that have the desired properties are:

_{i}*b*1+2

_{i}=^{-i}

*if*

*x*b*

_{1}

**b*

_{2}

*...b*1

_{i-}*<1*and

*b*1-2

_{i}=^{-i}

*if*

*x*b*

_{1}

**b*

_{2}

*...bi*1

_{-}*>1*. In choosing these values for the

*b*, the limit, as

_{i}*i*approaches infinity of the product of

*x*and the

*b*will be 1 as long as

_{i}s,*x*is in the domain (see Figure 4(c)). This represents the domain of convergence for this algorithm, which may be calculated as approximately 0.4194

*<x<*3.4627. If you want to calculate logarithms outside of this domain, either the input must be prescaled or the range of the

*i*values must be changed. The final algorithm becomes Listing Four.

To calculate the inverse of this algorithm, or 10^{x}, you only modify the existing algorithm such that *x* is driven to zero while *z* is multiplied by the successive coefficients, *b _{i}*. This follows from the fact that if

*z=*10

^{x}then

*z=b*10

_{i}*^{(x- log10}

*(bi*. As the exponent is driven to zero,

^{))}*z*is seen to approach the product of all the successive coefficients.

B

'

b_{i}

i=1

Listing Five is the final algorithm. The domain of convergence for this algorithm is determined by the possible values for which *x* can be driven to zero. By inspection of the algorithm this is determined to be Figure 4(d) or *x* is limited to the domain *-*0.5393*< x<0.3772. *As in the previous algorithm, the domain may be extended by scaling the initial value of *z* by (1+2^{i}) or (1-2^{i}).

### The Circular Functions *sin(x) *and *cos(x)*

The rotation matrix in Figure 5(a) rotates a vector

x_{0}

[

y_{0}]

counterclockwise by a radians in two-dimensional space. If this rotation matrix is applied to the initial vector

1

[

0]

the result will be a vector with coordinates of

cos a

[

sin a]

It is easily seen that the CORDIC method could be applied to calculate the functions *sin*(*x*) and *cos*(*x*) by applying successive rotations to the initial vector

1

[

0]

and gradually driving the angle *a* to zero.

However, a problem arises when an attempt is made to set up the rotation matrix such that all rotations are accomplished by right shifts. If *a _{i}* is chosen such that

*cos(a*, the

_{i})=2^{-i}*sin(a*is not necessarily a power of 2. It is not possible to choose the successive angle rotations,

_{i})*a*, such that both the

_{i}*cos(a*and

_{i})*sin(a*amount to right shifts.

_{i})

In working around this problem, you can modify the rotation matrix by bringing a *cos(a)* term out of the matrix; see Figure 5(b). Now the rotation angles *a _{i}* may be chosen such that

*tan(a*2

_{i})=^{-i}or rather

*a*2

_{i}=tan^{-1}(^{-i}

*)*. The result is the final incremental rotation matrix in Figure 5(c), where

*a*2

_{i}=tan^{-1}(^{-i}

*)*. With these choices for the

*a*, rotation is accomplished using only right shifts. If the

_{i}*cos(a*term is neglected to avoid the multiplication operations, the length of the initial vector is increased each time it is rotated by using right shifts only. This increase may be compensated for by decreasing the length of the vector prior to rotation. Because the algorithm uses

_{i})*B*successive rotations, all rotations may be compensated for initially using one collective length correction factor,

*C*. The value of

*C*is found by grouping all of the

*a*terms together as in Figure 5(d).

_{i}

For *B*=16 bits, *C* may be calculated as approximately 0.6072. Listing Six presents the final algorithms, *x* and *y* represent vector coordinates, while *z* is now the angle register. It can be determined that the previous two algorithms will converge as long as Figure 5(e) or *-*1.7433*<z<*1.7433.

Since the domain of convergence includes both the first and fourth quadrants, the algorithms will converge for any *z* such that *-p/*2*<z<p/*2.

### Mapping CORDIC Algorithms to Microcontrollers

The previously discussed algorithms show that CORDIC-based computation methods require minimal hardware features to implement:

- Three registers of length
*B*bits. - One, two, or three Adders/Subtractors.
- Several small ROM-based look-up tables.
- One, two, or three shift registers.

When implementing CORDIC algorithms on microcontrollers, the shift registers will have the greatest effect on the overall throughput of the system. Multiplication by 2^{-i} requires that the shift register be capable of performing a right shift by *i* bits. Most microcontrollers are only capable of right shifting by 1 bit at a time. Shifting by *i* bits requires a software loop to repeat this task *i* times, greatly increasing the computation time. The 8051, 6805, and 68HC11 are typical examples of microcontrollers that require software loops to implement the shifter.

Other microcontrollers, such as the 68HC332, as well as most DSPs, have a feature known as "barrel shifters," which right shifts by *i* bits in one operation. Typically the shift is also accomplished in one clock cycle.

Another possibility for implementing a barrel shifter is to use a multiply instruction that has been optimized for speed. An example of this is the 68HC12, which has a 16×16 bit signed multiply, and EMULS that produces a 32-bit result in three clock cycles. A right shift by *i* bits could be accomplished by multiplying by 2^{16-i} and discarding the lower 16 bits of the result. One disadvantage of this scheme is that the data is restricted to 16 bits. Other word lengths would require additional cycles.

Once the processor and shift register style is chosen, the next choice to be made involves the data format. Since Standard C does not provide a fixed-point data type, you have a lot of freedom in choosing the format of the data. It is a good idea, however, to choose a format that fits into 16- or 32-bit words. Even though most CORDIC routines are written in assembly language for speed, 16- or 32-bit words allow data to be passed as either *int* or *long int* data types within higher level C subroutines. The format presented here uses a 16-bit format with 4 bits to the left of the decimal point and 12 fractional bits to the right, which is often referred to as "4.12 format." This allows constants such as *p*, *e*, and --2 to be easily represented without a moving decimal point. The 12 bits of fractional data amount to approximately 3.5 digits of decimal accuracy. The range of this format is calculated as *-*8*< x<*7.9997*.*

The constants used are found by multiplying by 2^{12} (4096), rounding, and converting to hexadecimal. Take the constant *e*, for example: 4096**e=*11134.0811134*= *0*x*267*e*. All of the data tables necessary for CORDIC computing may be built up this way using a calculator.

Finally, with the data format and constant tables established, coding of the algorithms proceeds in a straightforward manner. Available electronically (see "Resource Center," page 5) are source-code implementations of CORDIC algorithms for the 8051, 68HCll, and 68332 microcontrollers. This code was assembled with the Intel MCS-51 Macro-Assembler and Motorola Freeware Assemblers and tested on hardware development systems.

### Conclusion

CORDIC algorithms have been around for some time. For instance, Jack Volder described them in "The CORDIC Trigonometric Computing" (*IRE Transactions Electronic Computers, *September 1959). The reasons for using CORDIC algorithms have not changed. The algorithms are efficient in terms of both computation time and hardware resources. In most microcontroller systems, especially those performing control functions, these resources are normally already at a premium. Using CORDIC algorithms may allow a single chip solution where algorithms using the look-up table method may require a large ROM size or where power series calculations require a separate coprocessor because of the computation time required.

The algorithms presented have been selected to represent a small core of functions commonly required in microcontroller systems, which could be discussed in detail. For each algorithm in this core, three areas have been covered: theory of operation, determining the domain of convergence for the algorithm, and finally, implementation of the algorithm on a typical microcontroller. Using these selected algorithms as a starting point, you can develop libraries containing many similar elementary functions. Among those possible with only minor modifications to the algorithms presented are: *lnx*, *ex*, *tan ^{-}*1

*x*,

*--x*

^{2}

*+y*

^{2},

*and*

*e*. Among the references, Jarvis gives an excellent table of the functions possible using CORDIC routines.

^{jq}

**DDJ**

#### Listing One

multiply(x,y){ for (i=1; i=<B; i++){ if (x > 0) x = x - 2^(-i) z = z + y*2^(-i) else x = x + 2^(-i) z = z - y*2^(-i) } return(z) }

#### Listing Two

divide(x,y){ for (i=1; i=<B; i++){ if (x > 0) x = x - y*2^(-i); z = z + 2^(-i); else x = x + y*2^(-i); z = z - 2^(-i); } return(z) }

#### Listing Three

divide_4q(x,y){ for (i=1; i=<B; i++){ if (x > 0) if (y > 0) x = x - y*2^(-i); z = z + 2^(-i); else x = x + y*2^(-i); z = z - 2^(-i); else if (y > 0) x = x + y*2^(-i); z = z - 2^(-i); else x = x - y*2^(-i); z = z + 2^(-i); } return(z) }

#### Listing Four

log10(x){ z = 0; for ( i=1;i=<B;i++ ){ if (x > 1) x = x - x*2^(-i); z = z - log10(1-2^(-i)); else x = x + x*2^(-i); z = z - log10(1+2^(-i)); } return(z) }

#### Listing Five

10_to_power(x){ z = 1; for ( i=1;i=<B; i++ ){ if (x > 0) x = x - log10(1+2^(-i)); z = z + z*2^(-i); else x = x - log10(1-2^(-i)); z = z - z*2^(-i); } return(z) }

#### Listing Six

sin(z){ x = 0.6072; y = 0; for (i=0; i=<B; i++){ if (z > 0) x = x - y*2^(-i) y = y + x*2^(-i) z = z - arctan(2^(-i)) else x = x + y*2^(-i) y = y - x*2^(-i) z = z + arctan(2^(-i)) } return(y) } cos(z){ x = 0.6072; y = 0; for (i=0; i=<B; i++){ if (z > 0) x = x - y*2^(-i) y = y + x*2^(-i) z = z - arctan(2^(-i)) else x = x + y*2^(-i) y = y - x*2^(-i) z = z + arctan(2^(-i)) } return(x) }