Adding the Power of DSP to Your Application

MAY91: ADDING THE POWER OF DSP TO YOUR APPLICATION

This article contains the following listings: BITTMAN.ARC

Jim is the president and founder of Bittware Research Systems. He has a BSEE from MIT and an MSEE from the University of Maryland. He can be reached at Bittware Research Systems, 400 East Pratt Street, Baltimore, MD 21202.

When confronted with the challenges presented by Digital Signal Processing (DSP), programmers usually come up with questions such as, "Why would I want to use DSP in the first place?" "How hard is DSP development, and what is required to integrate a DSP processor into my PC-based system?" and, most importantly, "Is it worth the trouble?"

The answer to the first question is simple: The main reason for adding DSP power to your application is to speed up the program to achieve real-time processing speeds--speeds typically in the range of 25 Mflops. The response to the second question isn't quite so straight-forward because DSP programming varies from board to board and processor to processor. In general, the development packages supplied with most commercial DSP add-in boards make the process relatively simple. The main purpose of this article, in fact, is to describe how to add the power of DSP processing to your PC application using some of these off-the-shelf tools. As for the the third question, we'll return to it at the end of this article, when we evaluate the relative performance of some of the algorithms presented here.

I'll use as an example the classic DSP algorithm known as the Fast Fourier Transform (FFT). The hardware used in the example is a DSP coprocessor board from CAC, which is built around an AT&T DSP32c chip (32-bit floating point DSP processor) with 256K of 25ns SRAM and dual 16-bit analog input and output channels. The DSP32c runs at 50 MHz and is capable of 25 Mflops.

Fast Fourier Transform

For comparison, I'll implement the FFT in three different ways: as a C program running on the PC, as C code running on the add-in DSP board, and as DSP assembly language running on the add-in board.

The examples will utilize FFTs from several different sources. One of the FFTs comes from the book Numerical Recipes in C: The Art of Scientific Computing, an excellent reference that provides C source code and detailed information on many different algorithms. The source code is also provided on a diskette that can be compiled for the DSP32c without modification. We'll test this routine on both the PC and the DSP board.

I'll also try a PC-based FFT from the MathPak 87 library by Precision Plus Software. Finally, to demonstrate programming the DSP in assembler, I will use the assembly-code FFT routine provided by the AT&T application library. We will invoke the assembly language FFT from both a main module written in C as well as one written in DSP assembler.

I'll run the examples from within the Digital Signal Processing HeadQuarters (DspHq) environment. DspHq is my company's DSP development software package that integrates libraries of functions for the PC and DSP cards, along with graphics and data management. Please note that there is no intrisic requirement for DspHq; it is used as a matter of convenience. However, the listings as shown here do rely on DspHq. It should not be difficult to modify them for standalone operation. Note also that the electronic version of the listings contain additional DspHq-related files that could not be presented here for lack of space.

Testing an FFT requires a number of steps:

Synthesize a cosine wave form with a given amplitude, dc offset, frequency and sample rate.

Calculate the real-valued FFT of the cosine wave form.

Rederive the original waveform by using an inverse real-FFT.

Calculate log-magnitude of FFT.

Display all waveforms graphically (as shown in Figure 1 through Figure 4).

The next section shows these steps as accomplished on a PC.

FFT On the PC

The PC-based executable file is called ddj.exe and incorporates both the PC-based FFT routines from Numerical Recipes and the functions necessary to invoke binaries that will execute on the add-in board.

The main module is called ddj.c and is shown in Listing One (page 90). The comments in ddj.c contain information such as the makefile used to create ddj.exe. The heart of this routine is a switch statement that selects from among the different functions necessary for our test. The first four case blocks invoke PC-based routines. Case 1 calls synth_cos to generate a cosine; case 2 calls the routine realft, which is the algorithm from Numerical Recipes optimized for real values; case 3 also calls realft, but invokes it as the inverse transform; finally, case 4 calls a function to calculate the log-magnitude. The rest of the case blocks are used to invoke functions residing on the DSP board and are discussed later in this article.

For purposes of timing comparison, I also implemented a host-resident C program that calls an FFT routine optimized for the 80x87. This routine comes from Precision Plus' MathPak87 library. The driving source file is called ddj_mp87.c and is shown in Listing Two (page 90).

C Code Running On the DSP

Binary files that will execute on the DSP32c have the suffix .32c. Those for the older DSP32 have the suffix .32. Although the two devices are source compatible (the DSP32c being a superset), their binary formats are not.

The C source file for the example is ddj_32c.c, and its corresponding executable is ddj_32c.32c. Executable files are produced by d3cc, the AT&T C-compiler for the DSP32c. This compiler produces object files with a .o file extension. The command d3cc -c realft.c compiles the algorithm from Numerical Recipes in C without modification. For example, the resulting object file, realft.o, is then linked with the other files and libraries shown in Example 1.

Example 1: Compile/link command to produce a DSP32c executable is from the C language source. I use the math library first, even though it is slower, because the error checking is more complete, and the routines accept a wider range of inputs.

  d3cc -lm -lap -0 ddj_32c.32c ddj_32c.o four1.o realft.o \
          -s startup.o -m memory.map
  d3cc                : AT&T DSP32c 'C' compiler
  -lm                 : Use math library first
  -lap                : Use application library second
  -0 ddj_32c.32c      : output file name ddj_32c.32c
  ddj_32c.o           : object file for linker
  four1.o             : object file for linker
  realft.o            : object file for linker
  -s startup.o        : assembly startup file (hardware dependent)
  -m memory.map       : memory configuration file (hardware dependent)

Downloading and running this executable is accomplished by the code fragment shown in case 6 in ddj.c (Listing One). You can also download and execute a binary from the DOS command line with the AT&T d3dl utility. You can choose the most appropriate blend of working in an integrated environment (such as DspHq) versus working with standalone utilities (such as d3dl); d3emu, a symbolic debugger that comes with the add-in board; and d3hex, a utility to produce output suitable for PROM burners.

In this case, I have combined several functions into one executable. The DSP-resident program waits for a function number sent from the PC and then executes the proper routine, clears the function number, and waits for the next instruction.

The main module for the target-based executable, ddj_32c.c, is shown in Listing Three (page 92). This is analogous to the host-based ddj.c shown earlier in that it basically consists of a switch statement that dispatches to the requested functions. The functions, in the order shown in the case statement, are:

Convert from IEEE format to DSP format.
Convert from DSP to IEEE.
Synthesize a cosine waveform.
Invoke realft for a forward FFT.
Invoke realft for an inverse FFT.
Calculate the log-magnitude.
Invoke rffta, the assembly-language implementation of the FFT.

Because the DSP32C uses a data format different from the PC, when data is downloaded, the DSP converts the numbers to its own internal format. When uploading, the DSP converts the data to IEEE format, waits for the PC to upload, and then converts the data back to DSP format. A single-cycle instruction is available for the conversion in both directions. These details are hidden in the ul_float and dl_float functions.

Assembly Code on the DSP

For testing assembly-code, we'll use the FFT routine rfft, provided by the AT&T assembler application library. (Note the similarity between DSP32c assembly language and C, a similarity that makes it relatively easy to become proficient in DSP32c assembler.) Invoking rfft is similar to invoking the C routine realft, with the exception of having to download the number of stages of the FFT prior to execution. This is shown in case 9 of main( ) in ddj.c Listing One).

I've also implemented the equivalent of ddj_32c.c in assembler. Because assembly-language source files have the suffix .s, our file is called ddj_32s.s and is shown in Listing Four (page 94). AT&T provides its own assembler/linker program called d3make used with assembly language programs only. Example 2 shows how this would be invoked in the source code file.

Example 2: Invoking the AT&T assembler/linker program called d3make in the source code file.

  d3make -o ddj_32s.32c -M6 -Q -W ddj_32s.s
  d3make                   : AT&T assembler/linker make
  -o ddj_32s.32c           : output file name
  -M6                      : memory mode, hardware dependent
  -Q                       : DSP32c (as opposed to DSP32) target device
  -W                       : turn off warnings
  ddj_32s.s                : source file input

The Results

As I stated at the beginning of this article, the main reason for adding DSP power to your application is to speed it up and achieve real-time processing speeds. The timing results for the examples developed here are shown in Table 1. Except where noted, the DSP32c processing times include data upload/download and conversion. As you can see, realft running on the DSP32c is about four times faster than the same code running on a 25-MHz 386/387 computer, and the assembly-language rffta is another factor of four faster than that. If you subtract all overhead times (download, upload, and conversion), you end up with a 80-fold speed-up over straightforward hostbased C code, and a 40-fold improvement over the optimized PC-based routines from the MathPak library.

Table 1: Timing tests were calculated on a 25-MHz 386 PC(25-MHz 387). 1024 point real-valued FFT, input sine wave. 200 Hz sine sampled at kHz, amplitude = 1 dc offset = 0, phase = 0.

  Routine Name                     Computation   Execution time (mS)
  ------------------------------------------------------------------
  Numerical Recipes, realft          PC              165.8
  MathPak 87, rvfft                  PC               86.3
  Numerical Recipes, realft          DSP32c           45.6*
  AT&T Application Library, rffta    DSP32c            9.8*
  AT&T Application Library, rffta    DSP32c            2.1**

  Generate Cosine                    PC               94.0
  Generate Cosine                    DSP32c           38.4

  Download and convert               n/a               2.2
  Convert, upload, and convert       n/a               3.3

  Upload rate = 2.7 Mbytes/sec, Download rate = 2.9 Mbytes/sec
   *Includes data download, upload, and conversion
   **Algorithm only, no overhead.

Products Mentioned

DspHq Signal Processing Development Software BittWare Research Systems Inner Harbor Center, 8th Floor 400 East Pratt Street Baltimore, MD 21202 800-848-0435

D3emu Debugger Board Interface Library Communications, Automation, and Control 1642 Union Blvd. Allentown, PA 18103 800-367-6735

Numerical Recipes in C: The Art of Scientific Computing William H. Press, Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling ISBN 0-521-35465-X Cambridge University Press New Rochelle, NY 10801

DSP32c Assembler DSP32c C Compiler DSP32c Application Library AT&T Microelectronics Dept. 52AL330240 555 Union Blvd. Allentown, PA 18103 800-372-2447

MathPak 87 Precision Plus Software 939 Griffith Street London, Ontario N6K 3S2 Canada 519-657-0633


_ADDING THE POWER OF DSP TO YOUR APPLICATIONS_
by Jim Bittman

[LISTING ONE]

<a name="0115_000e">
/*-------------------- DDJ.C-----------------------------------------
|  PC-Resident program for FFT and for controlling DSP co-processor
|  The Makefile for this program is:
|   .c.obj:
|         cl /AL -c -DANSI $<
|   ddj.exe: ddj.obj dspif.obj dspif.h four1.obj realft.obj
|         cl /AL ddj.obj dspif.obj four1.obj realft.obj \
|             -link dsputill.lib hqmcil.lib
|  This program assumes the DspHq environment and requires additional
|  support files (menu definition and function specification files)
|  not listed in the Makefile. However, it should not be difficult to
|  excerpt the relevant portions of this code for standalone execution.
+------------------------------------------------------------------*/

#include <stdio.h>              /* include for NULL define         */
#include <math.h>               /* include for math functions      */
#include "hq_ci_.h"             /* include for dsphq interface     */
#include "nr.h"                 /* include nr function prototypes  */
#include "nrutil.h"             /* include nr utility prototypes   */
#include "dsputil.h"            /* include CAC prototypes          */
#include "dspif.h"              /* dsp interface function header   */

/*---------- define the menu parameter types ----------------------*/
typedef unsigned long parm1_type;       /* Buffer 1                */
typedef unsigned long parm2_type;       /* Buffer 3                */
typedef menu_float    parm3_type;       /* Cosine Amplitude        */
typedef menu_float    parm4_type;       /* Cosine DC-offset        */
typedef menu_float    parm5_type;       /* Cosine Frequency        */
typedef menu_float    parm6_type;       /* Cosine Sammple Rate     */
/*------------ constant definitions -------------------------------*/
#define SQR(a) ((a)*(a))
#define PI               (float) 3.141592654
#define BAD_FUNC_NUM     (int) -1       /* user error code */
/*----- Function Constants For Dsp32 'C' & Assembly Code ------*/
#define DSP32_X     1      /* converts IEEE ==> DSP numbers */
#define IEEE32_X    2      /* converts DSP  ==> IEEE numbers */
/*----- Function Constants For Dsp32 'C' Code ----------------*/
#define GENCOS_C    3      /* generate test cosine */
#define REALFT_C    4      /* performs realft, from numerical recipes */
#define IREALFT_C   5      /* performs inv_realft, from numerical recipes */
#define LOGMAG_C    6      /* calculates log-magnitude */
#define RFFTA_C     7      /* calculates real fft using app-lib */
/*----- Function Constants For Dsp32 Assembly Code ------------*/
#define RFFT_S      3      /* performs rfft, from AT&T application library */
#define MAG_S       4      /* calculates magnitude-squared */
#define LOG_S       5      /* calculates log */
/*----- Function Prototypes ----------------------------------*/
int     ilog2      (unsigned int);
void    log_mag    (float far * i1, float far * o1, long bs);
void    scale_data (float far *output1, float scale_val, long np);
void    synth_cos  (float far *data1, long np, float a, float d, float f,
                                                                     float s);
main(int argc, char *argv[])
{  long                 indx;           /* used for loop index */
   int                  func_num;       /* for function number from dsphq */
   long                 np;             /* for block size from dsphq */
   float far *          input1;         /* array address */
   float far *          output1;        /* array address */
   long                 bs;             /* address of DSP blocksize var */
   long                 flag;           /* address of DSP funcnum flag */
   parm1_type           b1;             /* DSP Buffer #1 */
   parm2_type           b2;             /* DSP Buffer #2 */
   parm3_type           amp;            /* cosine amplitude */
   parm4_type           dco;            /* cosine DC offset */
   parm5_type           freq;           /* cosine frequency */
   parm6_type           samprate;       /* cosine sample rate */
   init_intfc(argc, argv);              /* init dsphq interface */
   func_num = get_func_num();           /* get the function number */
   np = get_block_size();               /* get the block size */
   /* get menu parameters */
   b1   = get_parm(1);                  /* DSP Buffer #1 */
   b2   = get_parm(2);                  /* DSP Buffer #2 */
   amp  = get_parm(3);                  /* cosine amplitude */
   dco  = get_parm(4);                  /* cosine DC offset */
   freq = get_parm(5);                  /* cosine frequency */
   samprate = get_parm(6);              /* cosine sample rate */
   /* get array addresses */
   input1  = get_data_in_ptr(1);        /* base address of input #1 */
   output1 = get_data_out_ptr(1);       /* base address of output #1 */
   /* perform selected function */
   switch (func_num)
   {
      case 1 :       /*--- Synthesize Cosine Using PC ---------------------*/
         synth_cos(output1, np, amp, dco, freq, samprate);
         break;
      case 2 :      /*---- Numerical Recipes Forward Real FFT Using PC ----*/
         output1--;                             /* NR funcs index at 1     */
         realft(output1, np>>1, 1);             /* forward real fft        */
         break;
      case 3 :       /*---- Numerical Recipes Inverse Real FFT Using PC ----*/
         output1--;                             /* NR funcs index at 1      */
         realft(output1, np>>1, -1);            /* inverse real fft         */
         output1++;                             /* reallign address         */
         scale_data(output1,1.0/(np >> 1),np);  /* restore original ampl.   */
         break;
      case 4 :     /*------ Calculate LOG(10)-[MAGNITUDE] using PC --------*/
         if (input1)
            log_mag(input1, output1, np);       /* take logmag of input     */
         else
            log_mag(output1, output1, np);      /* perform logmag in-place  */
         break;
      case 5 :     /*--------- Synthesize Cosine Using DSP-32C -------------*/
         init_dsp("ddj_32c.32c",&flag,&bs,np);  /* download dsp code & init */
         dsp_dl_fp(get_addr("amp"),amp);        /* download floats          */
         dsp_dl_fp(get_addr("dco"),dco);
         dsp_dl_fp(get_addr("freq"),freq);
         dsp_dl_fp(get_addr("samprate"),samprate);
         set_dspbuf("o1", b1);                  /* set dsp buffer address   */
         dsp_dl_int(flag,GENCOS_C);             /* invoke function on dsp   */
         wait4dsp(flag);                        /* wait for dsp to finish   */
         ul_float(output1,np,flag,b1);          /* upload results           */
         break;
      case 6 :  /*---- Numerical Recipes Forward Real FFT Using DSP-32C ----*/
         init_dsp("ddj_32c.32c",&flag,&bs,np);  /* download dsp code & init */
         set_dspbuf("o1", b1);                  /* set dsp buffer address   */
         dl_float(input1,np,flag,b1);           /* download float array     */
         dsp_dl_int(flag,REALFT_C);             /* execute "realft" on dsp  */
         wait4dsp(flag);                        /* wait for dsp to finish   */
         ul_float(output1,np,flag,b1);          /* upload results           */
         break;
      case 7 : /*---- Numerical Recipes Inverse Real FFT Using DSP-32C -----*/
         init_dsp("ddj_32c.32c",&flag,&bs,np);  /* download dsp code & init */
         dl_float(input1,np,flag,b1);           /* download float array     */
         set_dspbuf("o1", b1);                  /* set dsp buffer address   */
         dsp_dl_int(flag,IREALFT_C);         /* execute "inv_realft" on dsp */
         wait4dsp(flag);                        /* wait for dsp to finish   */
         ul_float(output1,np,flag,b1);          /* upload results           */
         break;
      case 8 :      /*----- Calculate LOG(10)-[MAGNITUDE] using DSP-32C ----*/
         init_dsp("ddj_32c.32c",&flag,&bs,np);  /* download dsp code & init */
         dl_float(input1,np,flag,b1);           /* download float array     */
         set_dspbuf("i1", b1);                  /* set dsp buffer address   */
         set_dspbuf("o1", b2);                  /* set dsp buffer address   */
         dsp_dl_int(flag,LOGMAG_C);          /* execute "inv_realft" on dsp */
         wait4dsp(flag);                        /* wait for dsp to finish   */
         ul_float(output1,np,flag,b2);          /* upload results           */
         break;
      case 9 :     /*------- Forward Real FFT Using DSP-32C App-Lib --------*/
         init_dsp("ddj_32s.32c",&flag,&bs,np);  /* download dsp code & init */
         dl_float(input1,np,flag,b1);           /* download float array     */
         set_dspbuf("o1", b1);                  /* set dsp buffer address   */
         dsp_dl_int(get_addr("stages"),ilog2(np));
                                                /* download int             */
         dsp_dl_int(flag,RFFT_S);               /* execute "rfft" on dsp    */
         wait4dsp(flag);                        /* wait for dsp to finish   */
         ul_float(output1,np,flag,b1);          /* upload results           */
         break;
      case 10:     /*-------- Download Data To DSP-32C ---------------------*/
         init_dsp("ddj_32s.32c",&flag,&bs,np);  /* download dsp code & init */
         dl_float(input1,np,flag,b1);       /* download data from pc to dsp */
         break;
      case 11:    /*--------- Upload Data From DSP-32C ---------------------*/
         init_dsp("ddj_32s.32c",&flag,&bs,np);  /* download dsp code & init */
         ul_float(output1,np,flag,b1);          /* upload results           */
         break;
      default : set_err_return(BAD_FUNC_NUM); break;
   }
}
/*--- This function returns the integer part of the log base 2 of x. ---*/
int ilog2(unsigned int x)
{
   return( x >> 1 ? 1 + ilog2(x >> 1) : 0);
}
/*--- This function scales np elements of data1[] by scale_val. ---*/
void scale_data(float far * data1, float scale_val, long np)
{  long i;
   for (i = 0; i < np; i++)   {
      data1[i] *= scale_val;
   }
}
/*--- Function generates cosine data: data[i] = A cos((2 pi f/s) i) + d ---*/
void synth_cos(float far * data1, long np, float a, float d, float f, float s)
{  long i;
   float theta, angle_step;
   angle_step = 2.0 * PI * f / s;
   theta = 0.0;
   for (i = 0; i < np; i++) {
      data1[i] = (a * cos(theta)) + d;
      theta += angle_step;
   }
}
/*--- log_mag ---*/
void log_mag(float far * i1, float far * o1, long bs)
{  long i;
   long n;
   n = bs >> 1;
   o1[0] = log10(SQR(i1[0]));
   for (i = 1; i < n; i++)   {
      o1[i] = log10(SQR(i1[2*i]) + SQR(i1[2*i+1]));
   }
   for (i = n; i < bs; i++)   {
      o1[i] = 0.0;
   }
}





<a name="0115_000f">
<a name="0115_0010">

[LISTING TWO]

<a name="0115_0010">

/*-----------------DDJ_MP87.C-------------------------------------------
|  MathPak 87 FFT Function Group Execution Source File
|  The makefile is: ddj_mp87.exe: ddj_mp87.c
|                      cl /AL ddj_mp87.c -link hqmcil.lib mpak87.lib
+----------------------------------------------------------------------*/
#include <hq_ci_.h>                     /* include function prototypes */
#include <mpak87.h>                     /* include math pak header */
#define BAD_BLOCK_SIZE        -1        /* user defined error codes */
#define BAD_FUNC_NUM          -2

int fft_stages(long fft_size); /*function prototype*/

void main(int argc, char *argv[])
{
   int         m;       /* number of fft stages */
   far_array_of_double o1;  /* data pointer */
   init_intfc(argc, argv); /* MUST do this before other interface functions */
   o1 = get_data_out_ptr(1);   /* get address of data */
   if ((m = fft_stages(get_block_size())) == 0)
      set_err_return(BAD_BLOCK_SIZE);  /* won't happen if .fnc file correct */
   else
   switch(get_func_num())   {
      case 1: rvfft(o1, m); break;    /* real fft from MathPak library     */
      case 2: irvfft(o1, m); break;   /* inverse real fft from MathPak     */
      default:
        set_err_return(BAD_FUNC_NUM); /* won't happen if .fnc file correct */
      break;
   }
}
/* return the log(base 2) of the input, or 0 if input is not a power of 2 */
int fft_stages(long fft_size)
{  int rtn;
   int sw_fft;
   sw_fft = fft_size;
   switch(sw_fft)   {
      case    8 : rtn =  3; break;
      case   16 : rtn =  4; break;
      case   32 : rtn =  5; break;
      case   64 : rtn =  6; break;
      case  128 : rtn =  7; break;
      case  256 : rtn =  8; break;
      case  512 : rtn =  9; break;
      case 1024 : rtn = 10; break;
      case 2048 : rtn = 11; break;
      case 4096 : rtn = 12; break;
      case 8192 : rtn = 13; break;
      default   : rtn =  0; break;
   }
   return(rtn);
}






<a name="0115_0011">
<a name="0115_0012">

[LISTING THREE]

<a name="0115_0012">

/*--------------------DDJ_32C.C--------------------------------------
| This file is will run on the DSP add-in board after compilation
| by the AT&T C compiler d3cc. The makefile is:
|  .c.o:
|       d3cc -c $<
|  .s.o:
|       d3as -W -Q $<
|   ddj_32c.32c: ddj_32c.o startup.o memory.map four1.o realft.o
|       d3cc -lm -lap -o ddj_32c.32c ddj_32c.o four1.o realft.o \
|        -s startup.o -m memory.map
+--------------------------------------------------------------------*/
#include <stdio.h>
#include <math.h>
#include <nr.h>

#define PI 3.1415926
#define SQR(a) ((a)*(a))

asm(".global i1, i2, o1, o2");
asm(".global funcnum, bs");
asm(".global amp, dco, freq, samprate");

short funcnum;
short bs;
float *i1, *i2, *o1, *o2;
float amp, dco, freq, samprate;
/*------------------------------------------------------------*/
short fft_stages(fft_size)
short fft_size;
{
   short rtn;
   switch(fft_size)
   {
      case   32 :   rtn =  5;    break;
      case   64 :   rtn =  6;    break;
      case  128 :   rtn =  7;    break;
      case  256 :   rtn =  8;    break;
      case  512 :   rtn =  9;    break;
      case 1024 :   rtn = 10;    break;
      case 2048 :   rtn = 11;    break;
      case 4096 :   rtn = 12;    break;
      case 8192 :   rtn = 13;    break;
      default   :   rtn =  0;    break;
   }
   return(rtn);
}
/*------------------------------------------------------------*/
main()
{    register float scal;
     short n;
     float *data1, *data2, temp;
     register short i,j;

     while (1) {
       funcnum=0;
       while (!(funcnum))
       ;       /* Wait for PC to Download Function Number */
       n = bs >> 1;
       switch(funcnum) {
          case 1: /*--------- Convert to DSP Format ------------*/
                    dsp32(bs,o1); /*IEEE-->DSP */
                    break;
          case 2: /*--------- Convert to IEEE format -----------*/
                    ieee32(bs,o1); /*DSP-->IEEE*/
                    break;
          case 3: /*--------- Synthesize Cosine ----------------*/
                    scal = 2.0 * PI * freq / samprate;
                    j = 0;
                    data1 = o1;
                    for (i=bs; i-- > 0; j++) {
                      *data1++ = (amp * cos(scal * j)) + dco;
                    }
                    break;
          case 4: /*------- Forward FFT ------------------------*/
                    data1 = o1;
                    data1--;
                    realft(data1,n,1); /*from numerical recipes*/
                    break;
          case 5: /*------- Inverse FFT ------------------------*/
                    data1 = o1;
                    data1--;
                    realft(data1,n,-1); /*from numerical recipes*/
                    /* scale by 1/n to retain original amplitude*/
                    data1 = o1;
                    scal = 1.0 / n;
                    for (i=bs; i-- > 0; data1++) {
                      *data1 = *data1 * scal;
                    }
                    break;
          case 6: /*----- Calc LOG-MAGNITUDE (data output from NR-realft)--*/
                    o1[0] = log10(SQR(i1[0]));
                    temp  = log10(SQR(i1[1]));
                    for (i=1;i<n;i++) {
                      o1[i]=log10(SQR(i1[2*i])+SQR(i1[2*i+1]));
                    }
                    o1[n] = temp;
                    for (i=n+1; i<bs; i++) {
                      o1[i] = 0.0;
                    }
                    break;
          case 7: /*------ Forward FFT ----------------------------------*/
                    data1 = o1;
                    rffta(bs,fft_stages(bs),data1); /*uses AT&T app lib*/
                    break;
          default : break;
       }
     }
}




<a name="0115_0013">
<a name="0115_0014">

[LISTING FOUR]

<a name="0115_0014">

/*------ file DDJ_32S.S-----------------------------------------------
| Assembly language version of FFT test. The makefile is as follows:
|   ddj_32s.32c: ddj_32s.s
|        d3make -o ddj_32s.32c -M6 -Q -W ddj_32s.s
+---------------------------------------------------------------------*/
#include <dspregs.h>
/*--------------------------------------------------------------------*/
.global i1,o1,o2
.global funcnum, bs, stages
.global endofcode
/*--------------------------------------- initialization -------------*/
       r5 = 0x0003
       pcw = r5
       ioc  = 0x30CC0
/*--------------------------------------- wait until funcnum != 0 ----*/
begin:
       r5 = *funcnum
       nop
       if (eq) goto begin
       nop
/*--------------- switch on funcnum, which is set to function number --*/
/* func 1:  IEEE-->DSP */
     r5 = r5 - 1
     if (eq) goto do_dsp32
/* func 2: DSP-->IEEE */
     r5 = r5 - 1
     if (eq) goto do_ieee32
/* func 3: invoke rffta, from AT&T app library */
     r5 = r5 - 1
     if (eq) goto rffta
/* func 4: calc magnitude squared */
     r5 = r5 - 1
     if (eq) goto do_mag
     nop
/* func 5: calc log */
     r5 = r5 - 1
     if (eq) goto do_log10
     nop
/* illegal function number */
     goto finished
     nop
/*---------------------------------call to _rffta -----------------*/
rffta:
       r2e = *o1
       r1  = *bs
       r3  = *stages
       *fft_b = r2e
       *fft_n = r1e
       *fft_m = r3e
       call _rffta (r14)
       nop
fft_lv: int24    localV
fft_n:  int24    0
fft_m:  int24    0
fft_b:  int24    0
.align 4
       goto finished
       nop
/*---------------------------------calc magnitude------------------*/
do_mag:
       r8   = *bs
       r10e = *i1
       r8   = r8 - 2
       r11e = *o1
       nop
       a0 = *r10++                 /* DC value */
       nop
       a2 = *r10++                 /* Nyquist  */
       *r11++ = a0 = a0*a0         /* save DC mag */
magloop:
       a0 = *r10++
       nop
       a1 = *r10++
       a0 = a0*a0
       nop
       a1 = a1*a1
       nop
       nop
       *r11++ = a1 = a0 + a1
       if (r8-->=0) goto magloop
       nop
       *r11++ = a0 = a2*a2         /* save Nyquist mag */
       goto finished
       nop
/*---------------------------------calc log10------------------*/
do_log10:
       r12e = *i1
       r11e = *o1
       r13  = *bs
       r10e = in
       r9e  = out
       r13  = r13 - 2
logloop:
       *r10 = a3 = *r12++
       call _log10 (r14)
       nop
       int24     localV
       int24     in, out
.align 4
       *r11++ = a3 = *r9
       if (r13-->=0) goto logloop
       nop
       goto finished
       nop
/*--------------------------------------call _ieee32 converter-----*/
do_ieee32:
       r1e = *o1
       r2 = *bs
       *o1_ieee32 = r1e
       *bs_ieee32 = r2e
       call _ieee32 (r14)
       nop
            int24       localV
bs_ieee32:  int24       0
o1_ieee32:  int24       0
.align 4
       goto finished
       nop
/*--------------------------------------call _dsp32 converter-----*/
do_dsp32:
       r1e = *o1
       r2 = *bs
       *o1_dsp32 = r1e
       *bs_dsp32 = r2e
       call _dsp32 (r14)
       nop
bs_dsp32:  int24       0
o1_dsp32:  int24       0
.align 4
       goto finished
       nop
/*-------------------------------finished, set funcnum=0 -----------*/
finished:
       r1=0
       goto begin
       *funcnum=r1
bs:         int 256
stages:     int 8
funcnum:    int 0
.align 4
i1: int24 0x2000
o1: int24 0x2000
o2: int24 0x3000

.align 4
localV:   2*float   0.0
in:       float     0.0
out:      float     0.0
max:      float     0.0
scalefac: float     0.0
#include <_rffta.asm>
#include <_log10.asm>
#include <_ieee32.asm>
#include <_dsp32.asm>
/*------------------------------------- mark end of code------------*/
endofcode: int 0xDEAD, 0xC0DE

Parallel

Adding the Power of DSP to Your Application

Fast Fourier Transform

FFT On the PC

C Code Running On the DSP

Example 1: Compile/link command to produce a DSP32c executable is from the C language source. I use the math library first, even though it is slower, because the error checking is more complete, and the routines accept a wider range of inputs.

Assembly Code on the DSP

Example 2: Invoking the AT&T assembler/linker program called d3make in the source code file.

The Results

Table 1: Timing tests were calculated on a 25-MHz 386 PC(25-MHz 387). 1024 point real-valued FFT, input sine wave. 200 Hz sine sampled at kHz, amplitude = 1 dc offset = 0, phase = 0.

Products Mentioned

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Parallel Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Parallel

Adding the Power of DSP to Your Application

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Parallel Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content