Dr. Dobb's | Comparing Signals In The Time Domain

Comparing Signals In The Time Domain

May 01, 1991
URL:http://www.drdobbs.com/comparing-signals-in-the-time-domain/184402354

May 1991/Comparing Signals In The Time Domain

Michael E. Brandt received his B.S. in physics from the Polytechnic Institute of New York, and his M.S. and Ph.D. in biomedical engineering from the University of Houston. He is an assistant professor in the Department of Psychiatry and Behavioral Sciences at the University of Texas Medical School at Houston, and teaches a graduate course in digital signal processing at the University of Houston. His resarch interests are in the analysis of brain electrical signals. Dr. Brandt can be contacted at the University of Texas Medical School, 6431 Fannin, Room 5.240, Houston, TX 77030.

An important problem in systems science is the determination of the relationship between two signals in terms of their primary sources. This problem has implications in diverse fields. In communication engineering, for example, you often want to determine the characteristics of the channel connecting a transmitter and a receiver. In the study of epilepsy, you infer the focus of a brain seizure from the microvolt electrical signals measured from the scalp surface (the EEG).

Two related questions arise:

Are the signals related by a single source, dependent sources, or independent sources?
If two signals are related by a common source, is there a time-sequential relationship (a delay) between them?

In the communication engineering example, this delay would correspond to the signal transmission time. In the case of epilepsy, the delay would be proportional to the propagation velocity of the seizure as it spreads from its focus to adjacent brain regions.

The relationship that we are seeking between two signals will ultimately be of a causal nature. However, two signals can be highly dependent on one another yet bear no causal relationship. This is the case if the two signals are produced by a primary source. Unless the signal produced by this common source is included in the systems model, one cannot justify inferences concerning the causal nature of the two signals. The best one can hope for is a characterization of the degree of the association between the signals. This may in turn lead to insights into the nature of the sources themselves.

To illustrate the issues, I present here a program that lets you use two different methods for computing the degree of association and the time delay (if any) between two simulated signals.

Linear Correlation

Two signals x and y can be related through a simple linear system in the presence of additive noise by:

(1)     y[n] = L{x[n]} + e[n]

where L{} is any linear transformation acting on the input signal x[n], e[n] represents white noise independent of x[n], n is the discrete index, and y[n] is the output signal. The relationship between the two signals can be unambiguously represented by the correlation coefficient (r) in which the points of a scattergram of y vs. x are fitted with a straight line using the least mean-square method. The correlation coefficient between x and y can then be expressed as the square root of A/B, where A is the total variance in y minus the amount of variance in y not explainable from its linear dependence on x and B is the total variance in y.

Equation (2)

The correlation coefficient takes on continuous values from -1 to +1. Numbers close to +1 or -1 indicate a high linear relationship (either direct or inverse respectively), while coefficients close to 0 indicate a small or nonexistent linear relationship. The square of r is then a measure of the proportion of the total variance in y explainable by its linear dependence on x. It can be estimated computationally as in Equation (2).

Note that the r2 for x vs. y is equal to the r2 for y vs. x.

When a delay is present between y and x it can be estimated by computing the cross-correlation between them. This is done in practice by computing r2 as a function of a shift between the two signals. The maximum r2 represents the linear association between the two signals, and the time shift at which the maximum is found is an estimate of the delay.

Coherence Analysis

Coherence analysis is a frequency-domain method for determing the linear relationship between two signals. The coherence is computed by dividing the estimate of the cross-power spectrum of the two signals by the product of their auto-spectra as a function of frequency. It measures the linear dependence of each frequency component of signal y on signal x. I won't go into further detail here. The cross-phase spectrum measures the phase lag of each frequency component between x and y.

Nonlinear Analysis

If the relationship between the two signals is known to be nonlinear, then in general the linear correlation coefficient cannot be expected to properly quantify the association between them. A generalized model of the system relationship is:

(3)     y[n] = N{x[n]} + e[n]

where N{} now represents a nonlinear transformation of the input signal, with the other variables as in eq. (1) above. If the form of the functional relationship between x and y is known beforehand then that relationship can be used to perform a least-squares fit of the x vs. y scattergram. A "nonlinear regression coefficient" can then be arrived at analogously to eq. (2) by computing the proportion of the variance in y accounted for by this functional fit. In practice, you approximate the x-y relationship with as low an order polynomial as you can get away with. Once you determine the order, you perform curvilinear regression, again using the least-squares-fit method.

In many signal-analysis problems, curvilinear regression cannot be used effectively for two main reasons:

the order of the polynomial fit may be too high to implement practically
the statistical characteristics of the signals change in time (they are "nonstationary") which means that the order of the polynomial fit would have to be recomputed whenever the signals change character.

A method for getting around these problems was recently pioneered by J. Pijn of The University of Amsterdam (1990) who suggested that a numerical approximation of the x-y relationship be made from the data scattergram itself. The estimator of the relationship is referred to as the nonlinear regression coefficient (h2). It estimates the quantity:

(total y variance) - (unexplained y variance)
----------------------------------------------
             (total y variance)

Pijn's approach consists of approximating the regression curve using a piecewise-linear fit through the scattergram. The approximation procedure is a very simple one as follows:

The x data range is divided up into a number (L) of equal sized bins.
For each bin i, the midpoint of the x values (p[i]) and the average amplitude of the y values (q[i]) is obtained.
Finally, the (p[i], q[i]) points are connected by straightline segments to form the piecewise linear approximation to the regression curve (f(x)).

The nonlinear regression coefficient, h2, is then computed as:

where N is the number of samples, and <y> is the mean value of y[n].

One of the advantages of using h2 over r2 is that if the x-y relationship is linear, then h2 will be equal to r2, whereas if the relationship is nonlinear, r2 will not be close to 1 while h2 will be close to 1.

In general, the h2 of x vs. y will not be equal to the h2 of y vs. x. This will most likely be the case if x is not a single-valued function of y, or y is not a single valued function of x. If, however, the relationship is single-valued, the asymmetry should be small. Also, if the number of bins (L) is equal to the number of data points (N) then h2 will be 1. (All data points will be perfectly predicted.) Therefore, by increasing the number of bins, you can obtain an arbitrarily high fit to the scattergram. However, in practice h2 rapidly converges to its theoretical value as a function of the number of bins, and then increases very slowly thereafter up to 1. Empirically, Pijn found that a maximum of 10 bins for N > 512 was sufficient for characterizing the h2 between two EEG signals.

The h2 can also be used to estimate a delay between two signals that are nonlinearly related. You do this in a manner comparable to the computation of the cross-correlation function. Compute h2 as a function of lag between the two signals. The maximum h2 obtained and the corresponding lag represent the association and delay (respectively) between x and y.

Listing 1 is a test program for computing r2, h2, and the delay for several predetermined functions. These functions, y(x), are as follows:

a linear function, y[n] = ax[n] + be[n], and four non-linear functions
y[n] = sin(x[n]) + be[n]
y[n] -- sign(x[n])(x[n]2) + be[n]
y[n] = 3x[n]3 +be[n]
y[n] = 5x[n]5 + be[n]

x[n] and e[n] are uniformly distributed random numbers between 0 and 1. a and b are user-selectable gain constants. By varying b, you can study the effects of increasing the output noise levels on both r2 and h2. Additional functions can easily be added to the program. By studying the r2 and h2 estimates obtained for each of these functions, you can determine the effects of the nonlinearity on the variance estimate used.

The function estimate_delay is capable of both advancing or lagging one signal with respect to the other with a user-selected time shift. The sub-routine will then compute both r2 and h2 as a function of increasing lag between the two signals in order to estimate the delay between them. For each of the functions included in the program, the two methods are equally able to find the proper delay.

Conclusion

The h2 measure represents one possible time-domain approach for estimating either the linear or nonlinear relationship between two signals. Other possible approaches include polynomial regression and spline fitting. The non-linvar function provides a starting point for these alternative methodologies.

Reference

Pijn, J.P.M. (1990). "Quantitative Evaluation of EEG Signals in Epilepsy." Ph.D. Thesis, University of Amsterdam.

May 1991/Comparing Signals In The Time Domain/Listing 1

Listing 1

/*************************************
 *                                   *
 *    LINEAR & NONLINEAR VARIANCE    *
 *            TEST PROGRAM           *
 *                                   *
 *  M.E. Brandt, Ph.D.               *
 *  02/06/91 Revision                *
 *                                   *
 *************************************/

#include <stdio.h>
#include <math.h>
#include <malloc.h>


#define NMAX                   1024
#define NBINS                    10
#define MAX_DELAY                30
#define MAXRAND             32767.0
#define DBL_LEN      sizeof(double)
#define PI        3.141592653589793
#define TWOPI            (2.0 * PI)



double linvar(double *y, double *x, int nn);
double nonlinvar(double *x, double *y, int n, int nbins);
int    number_bins(int x);
void   estimate_delay(double *x, double *y, int n);



main()
{

   int xx, i, j, nb, n, ch;

   double *x, *y, scale, r2, h2, start,
         noise, gain, a, inc, mean, num;
   char u[3], c;


   srand(1);

  /* SELECT A FUNCTION TO TEST */

  do {

       printf("\n\nSelect a y vs. x function and ");
       printf("compute variances:\n");
       printf("\n1. y = ax + noise (linear)");
       printf("\n2. y = sin(x) + noise");
       printf("\n3. y = sgn(x)*(x*x) + noise");
       printf("\n4. y = pow(3*x, 3) + noise");
       printf("\n5. y = pow(5*x, 5) + noise");
       printf("\n6. Exit");
       printf("\n\nSelect 1-6: ");
       scanf ("%d", &ch);

       if(ch == 6) exit(0);

       do {

          printf("\n\nHow many data points?: ");
          scanf("%d", &n);
       }
       while(n > NMAX);

       do {

          printf("\n\nEnter noise gain: ");
          scanf("%lf", &gain);
          }
          while(gain < 0.0 || gain > 1.0);

          /* allocate space for x and y vectors */

          if((x = (double *)malloc(n * DBL_LEN)) == NULL) {
             fprintf(stderr, "\nMalloc error; x\n");
             exit(1);
          }

          if((y = (double *)malloc(n * DBL_LEN)) == NULL) {
             fprintf(stderr, "\nMalloc error; y\n");
             exit(1);
          }

        /* generate a random uniformly distributed x[n]
           between + or - 0.5
        */

        for(i=0; i<n; i++) x[i] =
                         ((double)rand()/MAXRAND - 0.5);


        /* choose y[n] */

        switch(ch) {

           case 1:  printf("\nEnter a: ");
                   scanf("%lf", &a);

                   for(i=0; i<n; i++) {
                      noise = gain *
                         ((double)rand()/MAXRAND - 0.5);
                      y[i] = a * x[i] + noise;
                   }
                   break;

           case 2:  for(i=0; i<n; i++) {
                      noise = gain *
                         ((double)rand()/MAXRAND - 0.5);
                      y[i] = sin(TWOPI * x[i]) + noise;
                   }
                   break;

           case 3:  for(i=0; i<n; i++) {
                      noise = gain *
                         ((double)rand()/MAXRAND - 0.5);
                      y[i] = (x[i] * x[i]);
                      if(x[i] < 0.0) y[i]*= -1.0;
                      y[i] += noise;
                   }
                   break;

           case 4:  for(i=0; i<n; i++) {
                     noise = gain *
                        ((double)rand()/MAXRAND - 0.5);
                     y[i] = pow(3.0*x[i], 3.0) + noise;
                   }
                   break;

           case 5:  for(i=0; i<n; i++) {
                     noise = gain *
                        ((double)rand()/MAXRAND - 0.5);
                     y[i] = pow(5.0*x[i], 5.0) + noise;
                   }
                   break;

          default:  break;

        }

    printf("\nTest delay estimation <y/n>?: ");
    scanf("%s", u);
    c = toupper(*u);
    if(c == 'Y') {
      estimate_delay(x, y, n);
      continue;
    }



    /* compute linear variance of x vs. y */

    r2 = linvar(x, y, n);

    printf("\nLinear r2(x,y) = %f\n", r2);


    /* choose number of bins */

    nb= number_bins(n);

    /* compute nonlinear variance of x vs. y */

    h2 = nonlinvar(x, y, n, nb);

    printf("\nNonlinear h2(x,y) = %f\n", h2);


    /* y vs. x */

    r2 = linvar(y, x, n);

    printf("\nLinear r2(y,x) = %f\n", r2);


    h2 = nonlinvar(y, x, n, nb);

    printf("\nNonlinear h2(y,x) = %f\n", h2);

    free(x); free(y);

  }
  while(1);


}


/* routine to find variances & delay between 2 lagged
   signals */

double buf[NMAX+MAX_DELAY], r2[MAX_DELAY+10],
      h2[MAX_DELAY+10];

void estimate_delay(double *x, double *y, int n)
{
   int i, j, d, delay, lag, max_lag, np, nb;

   double max;


   do {

     printf("\nEnter delay between x and y in samples: ");
     scanf("%d", &delay);
   }
   while((d = abs(delay)) > MAX_DELAY);


   for(i=0; i<d; i++)
      buf[i] = ((double)rand()/MAXRAND - 0.5);
   max_lag = d + 10;

   if(delay > 0) { /* y lags x */

      /* move y to buf */

      for(j=0, i=d; j<n; i++, j++) buf[i] = y[j];

      for(lag=0; lag<max_lag; lag++) {

          np = n - lag;    /* compute across less points */

          /* get linear variance at lag */

          r2[lag] = linvar(x, &buf[lag], np);

          nb= number_bins(np);

          /* get nonlinear variance at lag */

          h2[lag] = nonlinvar(x, &buf[lag], np, nb);

      }
   }

       else if(delay < 0) { /* x lags y */

              for(j=0, i=d; j<n; i++, j++) buf[i] = x[j];

              for(lag=0; lag<max_tag; lag++) {

                  np= n - lag;

                  r2[lag] = linvar(&buf[lag], y, np);

                  nb= number_bins(np);

                  h2[lag] = nonlinvar(&buf[lag], y, np, nb);

              }
       }


      /* find maximum r2 and corresponding delay */

         max = r2[0];
         for(i=1; i<max_lag; i++)
            if(r2[i] > max) {max = r2[i]; j = i;}

         if(delay < 0) j = -j;

         printf("\nTrue delay = %d; maximum r2 = %f \
    found @ sample %d\n",
                 delay, max, j);


      /* find maximum h2 and corresponding delay */

         max = h2[0];
         for(i=1; i<max_lag; i++)
            if(h2[i] > max) {max = h2[i]; j = i;}

         if(delay < 0) j = -j;

         printf("\nTrue delay = %d; maximum h2 = %f \
    found @ sample %d\n",
                 delay, max, j);


    }

    /* compute number of bins for nonlinear variance calc. */

    int number_bins(int x)
    {
        int y;

        if(x < 129) y = NBINS - 4;
        else
        if(x < 257) y = NBINS - 3;
        else
        if(x < 513) y = NBINS - 2;
        else        y = NBINS;

        return y;

    }

    /* routine to compute linear variance measure */

    double linvar(double *y, double *x, int nn)
    {

       register int j;

       double z, smx, smy, sqx, sqy, sxy, corr;
       double sqrt(), varx, vary, cov, nnn;



    /* initialize values */

    nnn = (double)nn;

    corr = sqy = 0.0;


    smx = 0.0;
    smy = 0.0;
    sqx = 0.0;
    sxy = 0.0;

    for(j=0; j<nn; j++) {

       smx += x[j];
       smy += y[j];
       sqx += (x[j] * x[j]);
       sxy += (x[j] * y[j]);


       sqy += (y[j) * y[j]);


    } /* end for */

    /* Compute covariance and variances */

    cov = (nnn * sxy) - (smx * smy);
    varx = (nnn * sqx) - (smx * smx);



    vary = (nnn * sqy) - (smy * smy);

    z = sqrt(varx * vary);

    if(z != 0.0) corr = cov / z;
    else corr = 0.0;

    return(corr*corr);

    }

    /* routine to compute nonlinear variance measure */

    int bin[NBINS] [2*NMAX/NBINS];
    double q[NBINS], p[NBINS];

    double nonlinvar(double *x, double *y, int n, int nbins)
    {
        int i, J, k, l, index, index2, last;

        double min, max, range, width, hwidth, start, end,
              mean, mean2, totvar, unvar, s, f, h2, uu, offset;


         /* find max and min of x array */

         min = max = x[0]; last = 0;

         for(i=1; i<n; i++) {
            if(x[i] > max) {max = x[i]; last = i;}
            else
            if(x[i] < min) min = x[i];
         }

         range = max - min;

         width = range/(double)nbins;

         hwidth = width/2.0;

         for(i=0; i<nbins; i++) bin[i][0] = 0;

        /* get histogram of indexes */

        for(i=0; i<n; i++) {
           for(j=0; j<nbins; j++) {
              start = (double)j*width + min;
              end = start + width;

              if((x[i] >= start) && (x[i] < end)) {
                 ++bin[j][0];
                 bin[j][bin[j][0]] = i;
                 break;
              }
           }
         }

     /* maximum x value (last one) */

     j = nbins-1;
     ++bin[j][0];
     bin[j][bin[j][0]] = last;

   /* compute x-midpoints and y average amplitudes */

    mean2 = 0.0;
    offset = hwidth + min;

    for(i=0; i<nbins; i++) {

       /* x value midpoints for each bin */

       p[i]= ((double)i * width) + offset;

       /* corresponding y average amplitudes */

       mean = 0.0;

       for(k=1; k<=bin[i][0]; k++) mean += y[bin[i][k]];

       q[i] = mean/double)bin[i][0];
       mean2 += mean;

    }


    mean = mean2/(double)n;     /* overall */


    /* compute h*h coefficient */

    /* first compute total variance of y */

    totvar = 0.0;
    for(i=0; i<n; i++) {
        s = y[i] - mean;
        totvar += (s * s);
    }

    /* compute total unexplained variance of y */

    unvar = 0.0;
    for(i=0; i<n; i++) {

        /* find f(x[i]), the nonlinear value */

        for(j=1; j<nbins-1; j++){
           if(x[i] <= p[j]) {
              index = j-1;
              goto out1;
           }
        }

        index = nbins-2;

  out1:
        index2 = index++;

        f = ((q[index2] - q[index])/
            (p[index2] - p[index])) *
              (x[i] - p[index]) + q[index];

        uu = y[i] - f;
        unvar += (uu * uu);

    }

    /* now compute nonlinear regression coefficient */


    h2 = 1.0 - (unvar/totvar);

    return h2;

}