Within the DSP processor all numbers must be stored within the system-defined word length, typically however internal registers for intermediate arithmetic operation are double precision (twice the nominal fixed word width) with additional "guard" bits for safety. This paper will focus on 16 bit wide fixed point format DSP processor implementations.
Essential Fixed Point DSP Terms
The following table defines terms essential to understanding this paper's material.
Term | Definition |
---|---|
Format | Digital-system numeric representation style; fixed point or floating point |
Fixed point | Processor architecture based on representing and operating on numbers represented in integer format |
Floating point | Processor architecture based on representing and operating on numbers represented in floating point format |
Floating point format | Numerical values are represented by a combination of a mantissa (fractional part) and an exponent |
Q-Format | Format for representing fractional numbers within a fixed length binary word. The programmer assigns an implied binary point which divides the fractional and integer numeric fields |
Radix point | Equivalent to a decimal point in base-10 math or a binary point in base-2 math. Separates integer and fractional numeric fields |
Precision | Number of bits used to represent a value in the digital domain, also called bus width or fixed word length |
Resolution | Smallest non-zero magnitude which can be represented |
Accuracy | Magnitude of the difference between an element's real value and it's represented value |
Quantization error | Difference in accuracy of representation of a signal's value in the analog domain and digital domain in a fixed length binary word |
Range | Difference between the most negative number and most positive number which can represent a value; ultimately determined by both numeric representation format and precision |
Dynamic range | Ratio of the maximum absolute value which can be represented and the minimum absolute value which can be represented |
Word length effects | Errors and effects associated with reduced accuracy representation of numerical values within a fixed word length |
Representation | Definition of how numbers are represented, including one's complement, two's complement, signed and unsigned. |
Scaling | Adjusting the magnitude of a value; typically accomplished by multiplication or shifting the binary (radix) point |
Truncation error | Loss of numeric accuracy required when a value must be shortened or truncated to fit within a fixed word length |
Roundoff error | Another term for truncation error |
Overflow | A computation with a result number larger than the system's defined dynamic range or addition of numbers of like sign resulting in an output with an incorrect sum or sign. Also called register overflow, large signal limit cycling or saturation. |
Saturation mode | A processor operational mode which prevents an overflow condition by forcing a computation's result value to the maximum numeric value rather than allowing an overflow condition |
Fixed point number representation
In base-2 math a binary point is the equivalent of a decimal point in traditional base-10 math. It serves to separate integer and fractional parts of a number. Another name for this concept is radix point. Implementation of a fixed point numerical representation requires the specifying the location of the radix point.
There are two conventional radix point locations, one for integer representation and another for fractional representation. A normal integer fixed length binary word has an implied radix point to the right of the LSB (least significant bit) of the word. In the case of a fractional fixed point implementation the radix point is located to the left of the MSB (most significant bit) of the significant numerical bits. This excludes the word's sign bit (typically the MSB) if one is present. Thus in the case of a signed factional fixed point number the default radix point is to the right of the MSB, which is the sign bit.
With either integer or fractional radix point location the hardware implementation remains the same since the multiplication operation is independent of the radix point location. Different processor families can have different "default" radix point locations. Since the location of the radix point is not fixed and must be tracked by the designer fixed point algorithms can be implemented in either fractional or integer formats.
Available range
With fixed point design the dynamic range of numbers is a key concern since a much narrower range of numbers can be represented in fixed format due to the fixed word size.
There are several different ways to represent a numerical value within a fixed length binary word. In this paper we will deal primarily with fixed binary word lengths of 16 bits. The maximum number of values which can be represented by a 16 bit binary word is 65,536.