Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Comparing Signals In The Time Domain


May 1991/Comparing Signals In The Time Domain

Michael E. Brandt received his B.S. in physics from the Polytechnic Institute of New York, and his M.S. and Ph.D. in biomedical engineering from the University of Houston. He is an assistant professor in the Department of Psychiatry and Behavioral Sciences at the University of Texas Medical School at Houston, and teaches a graduate course in digital signal processing at the University of Houston. His resarch interests are in the analysis of brain electrical signals. Dr. Brandt can be contacted at the University of Texas Medical School, 6431 Fannin, Room 5.240, Houston, TX 77030.

An important problem in systems science is the determination of the relationship between two signals in terms of their primary sources. This problem has implications in diverse fields. In communication engineering, for example, you often want to determine the characteristics of the channel connecting a transmitter and a receiver. In the study of epilepsy, you infer the focus of a brain seizure from the microvolt electrical signals measured from the scalp surface (the EEG).

Two related questions arise:

  • Are the signals related by a single source, dependent sources, or independent sources?
  • If two signals are related by a common source, is there a time-sequential relationship (a delay) between them?
In the communication engineering example, this delay would correspond to the signal transmission time. In the case of epilepsy, the delay would be proportional to the propagation velocity of the seizure as it spreads from its focus to adjacent brain regions.

The relationship that we are seeking between two signals will ultimately be of a causal nature. However, two signals can be highly dependent on one another yet bear no causal relationship. This is the case if the two signals are produced by a primary source. Unless the signal produced by this common source is included in the systems model, one cannot justify inferences concerning the causal nature of the two signals. The best one can hope for is a characterization of the degree of the association between the signals. This may in turn lead to insights into the nature of the sources themselves.

To illustrate the issues, I present here a program that lets you use two different methods for computing the degree of association and the time delay (if any) between two simulated signals.

Linear Correlation

Two signals x and y can be related through a simple linear system in the presence of additive noise by:

(1)     y[n] = L{x[n]} + e[n]
where L{} is any linear transformation acting on the input signal x[n], e[n] represents white noise independent of x[n], n is the discrete index, and y[n] is the output signal. The relationship between the two signals can be unambiguously represented by the correlation coefficient (r) in which the points of a scattergram of y vs. x are fitted with a straight line using the least mean-square method. The correlation coefficient between x and y can then be expressed as the square root of A/B, where A is the total variance in y minus the amount of variance in y not explainable from its linear dependence on x and B is the total variance in y.

Equation (2)

The correlation coefficient takes on continuous values from -1 to +1. Numbers close to +1 or -1 indicate a high linear relationship (either direct or inverse respectively), while coefficients close to 0 indicate a small or nonexistent linear relationship. The square of r is then a measure of the proportion of the total variance in y explainable by its linear dependence on x. It can be estimated computationally as in Equation (2).

Note that the r2 for x vs. y is equal to the r2 for y vs. x.

When a delay is present between y and x it can be estimated by computing the cross-correlation between them. This is done in practice by computing r2 as a function of a shift between the two signals. The maximum r2 represents the linear association between the two signals, and the time shift at which the maximum is found is an estimate of the delay.

Coherence Analysis

Coherence analysis is a frequency-domain method for determing the linear relationship between two signals. The coherence is computed by dividing the estimate of the cross-power spectrum of the two signals by the product of their auto-spectra as a function of frequency. It measures the linear dependence of each frequency component of signal y on signal x. I won't go into further detail here. The cross-phase spectrum measures the phase lag of each frequency component between x and y.

Nonlinear Analysis

If the relationship between the two signals is known to be nonlinear, then in general the linear correlation coefficient cannot be expected to properly quantify the association between them. A generalized model of the system relationship is:

(3)     y[n] = N{x[n]} + e[n]
where N{} now represents a nonlinear transformation of the input signal, with the other variables as in eq. (1) above. If the form of the functional relationship between x and y is known beforehand then that relationship can be used to perform a least-squares fit of the x vs. y scattergram. A "nonlinear regression coefficient" can then be arrived at analogously to eq. (2) by computing the proportion of the variance in y accounted for by this functional fit. In practice, you approximate the x-y relationship with as low an order polynomial as you can get away with. Once you determine the order, you perform curvilinear regression, again using the least-squares-fit method.

In many signal-analysis problems, curvilinear regression cannot be used effectively for two main reasons:

  • the order of the polynomial fit may be too high to implement practically
  • the statistical characteristics of the signals change in time (they are "nonstationary") which means that the order of the polynomial fit would have to be recomputed whenever the signals change character.
A method for getting around these problems was recently pioneered by J. Pijn of The University of Amsterdam (1990) who suggested that a numerical approximation of the x-y relationship be made from the data scattergram itself. The estimator of the relationship is referred to as the nonlinear regression coefficient (h2). It estimates the quantity:

(total y variance) - (unexplained y variance)
----------------------------------------------
             (total y variance)
Pijn's approach consists of approximating the regression curve using a piecewise-linear fit through the scattergram. The approximation procedure is a very simple one as follows:

  • The x data range is divided up into a number (L) of equal sized bins.
  • For each bin i, the midpoint of the x values (p[i]) and the average amplitude of the y values (q[i]) is obtained.
  • Finally, the (p[i], q[i]) points are connected by straightline segments to form the piecewise linear approximation to the regression curve (f(x)).
The nonlinear regression coefficient, h2, is then computed as:

where N is the number of samples, and <y> is the mean value of y[n].

One of the advantages of using h2 over r2 is that if the x-y relationship is linear, then h2 will be equal to r2, whereas if the relationship is nonlinear, r2 will not be close to 1 while h2 will be close to 1.

In general, the h2 of x vs. y will not be equal to the h2 of y vs. x. This will most likely be the case if x is not a single-valued function of y, or y is not a single valued function of x. If, however, the relationship is single-valued, the asymmetry should be small. Also, if the number of bins (L) is equal to the number of data points (N) then h2 will be 1. (All data points will be perfectly predicted.) Therefore, by increasing the number of bins, you can obtain an arbitrarily high fit to the scattergram. However, in practice h2 rapidly converges to its theoretical value as a function of the number of bins, and then increases very slowly thereafter up to 1. Empirically, Pijn found that a maximum of 10 bins for N > 512 was sufficient for characterizing the h2 between two EEG signals.

The h2 can also be used to estimate a delay between two signals that are nonlinearly related. You do this in a manner comparable to the computation of the cross-correlation function. Compute h2 as a function of lag between the two signals. The maximum h2 obtained and the corresponding lag represent the association and delay (respectively) between x and y.

Listing 1 is a test program for computing r2, h2, and the delay for several predetermined functions. These functions, y(x), are as follows:

  • a linear function, y[n] = ax[n] + be[n], and four non-linear functions
  • y[n] = sin(x[n]) + be[n]
  • y[n] -- sign(x[n])(x[n]2) + be[n]
  • y[n] = 3x[n]3 +be[n]
  • y[n] = 5x[n]5 + be[n]
x[n] and e[n] are uniformly distributed random numbers between 0 and 1. a and b are user-selectable gain constants. By varying b, you can study the effects of increasing the output noise levels on both r2 and h2. Additional functions can easily be added to the program. By studying the r2 and h2 estimates obtained for each of these functions, you can determine the effects of the nonlinearity on the variance estimate used.

The function estimate_delay is capable of both advancing or lagging one signal with respect to the other with a user-selected time shift. The sub-routine will then compute both r2 and h2 as a function of increasing lag between the two signals in order to estimate the delay between them. For each of the functions included in the program, the two methods are equally able to find the proper delay.

Conclusion

The h2 measure represents one possible time-domain approach for estimating either the linear or nonlinear relationship between two signals. Other possible approaches include polynomial regression and spline fitting. The non-linvar function provides a starting point for these alternative methodologies.

Reference

Pijn, J.P.M. (1990). "Quantitative Evaluation of EEG Signals in Epilepsy." Ph.D. Thesis, University of Amsterdam.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.