*
Shehrzad is a software engineer for picoLiter Inc. He can be contacted at shehrzad_q@hotmail.com.*

Matlab is a programming language and IDE that I find ideal for development of numerical algorithms. However, programmers used to C-style arrays and C programming will need to make mindset adjustments when coding efficient algorithms in the Matlab programming language. In particular, the use of vector operations is absolutely tantamount to ensure optimal performance. In this article, I share my experiences in implementing numerical methods in Matlab, describe how I transformed the code from a C/C++-style algorithm into vectorized Matlab code, and show the performance gains resulting from this transformation. The complete source code (in Matlab .m files) illustrating these points is available electronically; see "Resource Center," page 5.

### Mathematical Preliminaries

The solution to a system of nonlinear equations can be approximated through the use of iterative numerical techniques. A system such as this has the form of Figure 1(a). This system can be transformed by solving the *i*th equation for *x _{i}*, obtaining Figure 1(b).

One method for solving systems such as this is known as "fixed-point iteration" (also known as "successive approximation"). Given an initial guess to the solution, this scheme works by successively feeding refined estimates to the solution back into the system. The generated sequence should eventually converge to the solution. Of course, the system in question must satisfy certain prerequisites, such as continuity and differentiability, to name two. (For a more in-depth discussion of this numerical technique, see *Numerical Analysis*, Fifth Edition, by Richard L. Burden and J. Douglas Faires, Brooks/Cole, 1993.)

### Fixed-Point Iteration

Recently, I came across just such a system that needed to be solved in a "quasi real-time" fashion; that is, it needed to run as fast as possible. I typically use Matlab from The MathWorks (http://www.mathworks.com/) to prototype numerical algorithms. The system in question had two free variables; call them *x* and *y*. The system is described in Figures 2(a) and 2(b), where *F _{ij}*=element of an

*N*X

*M*matrix represents a sampled function;

*x*=element of a 1X

_{i}*N*vector represents the x-axis, and

*y*=element of a 1X

_{i}*M*vector represents the y-axis.

To solve for *x* and *y*, I utilized the fixed-point iteration scheme, using the initial guess in Figures 3(a) and 3(b). Given *F*, *x*, and *y*, my first cut at a Matlab implementation was coded up in a C-style fashion, forgoing the use of any vectorized operations. Listing One presents the *M*-file fixed_pt_iteration_1.m. Typically, you use the difference between two successive iterations as a stopping condition. That is, if the difference between the current expression for the approximate solution and the result of the earlier iteration is less than some tolerance, the iterative procedure terminates. Since I am really interested in code optimization and profiling, for the purpose of this discussion, I simply let the algorithm iterate 100 times.

One of the first operations this function performs is to use the Matlab *load* command to read from the file variables.mat the three Matlab variables: *F*, *x_axis*, and *y_axis*.

*F*is a 9X9 matrix representing a sampled function of two variables.*x_axis*is a 1X9 vector representing the "row dimension" of*F*.*y_axis*is a 1X9 vector representing the "column dimension" of*F*.

Next, the starting point is computed, and fixed-point iteration can then begin in earnest. The doubly nested loop continues evaluating the summation expressions, given by the equations in Figures 2(a) and 2(b). The subfunction *gaussian* evaluates the two-dimensional gaussian expression in Figure 4.

This implementation looks just like a line-by-line translation from a C snippet of code. Nowhere does it take advantage of any Matlab vector operations, which can greatly impact total computational time.

### An Optimized Implementation

Listing Two is my optimized Matlab code. The *fixed_pt_iteration_2* function starts out identical to the first version (the computation of the initial guess can be optimized as well, but since the performance bottleneck is in the main loop, I focus attention there). The first thing to notice is the elimination of the double *for* loop. The previous implementation cycled through each term of the summation expression given by Figures 2(a) and 2(b) one by one, continually incrementing the *numerator* and *denominator* variables. This is analogous to the += operator in C. What happens in the Matlab-optimized function is that matrices are formed, which are then multiplied together in an element-by-element fashion (not matrix multiplication), and finally the Matlab *sum* function is used to add up all of their elements.

The *gaussian* subfunction is no longer needed, as the exponential expressions are now evaluated in a vectorized fashion (the built-in Matlab function *exp*, like just about every other Matlab function, lets you pass in a matrix as well as a scalar as an input argument). To compute the expression given by the equation in Figure 4 without the use of *for* loops, matrices are formed using the Matlab function *repmat*. This function replicates a matrix in a tiling fashion. The matrix variables *x_grid* and *y_grid* replace the variables *x_axis* and *y_axis*. Likewise, the variable that stored the current approximate solution (*fixed_pt_solution*) has been replaced with the two matrices *x_approx* and *y_approx*. *x_approx* and *y_approx* are nothing more than 9X9 matrices consisting of the current approximated values of *x* and *y*, respectively. With these variables in place, evaluation of the summation terms present in the equations in Figures 2(a), 2(b), 3(a), and 3(b) is straightforward. The matrices are multiplied together in an element-by-element fashion — which is very different from matrix multiplication. Hence, the use of the ".*" operator is important; see Figures 5(a) and 5(b), an illustration of the differences between the "*" and ".*" operators.

Once the exponential terms have been evaluated, the *sum* function adds up the elements of the resultant matrices, the approximated values of *x* and *y* are updated accordingly, and the fixed-point iteration begins anew.

### Performance Profiling

To illustrate the performance gains that *fixed_pt_iteration_2* provides, I use the Matlab *profile* function. This function, as its name suggests, provides profiling data. I have also provided a simple helper function (given an integer, either 1 or 2, indicating which function to run and the number of times to run the function) that calls the desired function *N* times, and at the end, spits out instrumentation information. Tables 1(a) and 1(b) are example runs on my aging 400-MHz PII laptop for both functions. What is astounding is that on my platform, *fixed_pt_iteration_2* was on the average 20 times faster than *fixed_pt_iteration_1*.

### Conclusion

This example speaks volumes for the importance of avoiding *for* loops wherever possible when programming with Matlab. Matlab makes extensive use of BLAS routines, which are highly tuned toward an individual machine's memory hierarchies. As such, vector operations are always preferred over *for* loops. In fact, the performance gains resulting from usage of these vector operations are so significant that they often can drive the algorithm selection. For example, the fixed-point iteration algorithm described in this article technically is an example of Jacobi fixed-point iteration. As we have seen, this algorithm is easily vectorized. An alternative method, Gauss-Seidel iteration, can be theoretically shown to provide faster rates of convergence than Jacobi iteration. However, because Gauss-Seidel iteration cannot utilize vector arithmetic, the total computation time is actually much longer (see *Numerical Methods with MATLAB*, by G.J. Borse, PWS Publishing, 1997). Examples such as this abound in numerical analysis and scientific programming. For C programmers such as myself, vector code may appear strange at first, but clearly, the performance gains are significant.

### Acknowledgment

Thanks to my colleague Mohan Bodduluri for proof reading a draft of this article, and the fine people at Accuray who gave me the opportunity to explore interesting subject matters such as this when I worked there.

**DDJ**

#### Listing One

function fixed_pt_iteration_1 % FIXED_PT_ITERATION_1 un-optimized fixed-point iteration num_it = 100; % # of iterations num_vars = 2; sigma_term = .125; load variables.mat; % get F, x & y axes % get starting point for fixed-pt iteration fixed_pt_solution = get_initial_guess(x_axis, y_axis, F); length_x_axis = length(x_axis); length_y_axis = length(y_axis); for n=1:num_it denominator = 0; % denominator of expression given by equations 2(a) & 2(b) numerator = [0 0]; % numerator of expression given by equations 2(a) & 2(b) for ii = 1:length(x_axis) for jj = 1:length(y_axis) xy = fixed_pt_solution; % [ xi yi ] axis_pts = [x_axis(ii) y_axis(jj)]; term = F(ii,jj)*gaussian(xy, axis_pts, sigma_term); denominator = denominator + term; numerator(1) = numerator(1) + x_axis(ii) * term; numerator(2) = numerator(2) + y_axis(jj) * term; end end % update refined answer fixed_pt_solution = numerator./denominator; end disp(sprintf('Answer is (%f,%f)',fixed_pt_solution(1),fixed_pt_solution(2))); function initial_guess = get_initial_guess(x_axis, y_axis, F) % % GET_INITIAL_GUESS - returns starting point for fixed point iteration % evaluates expression given by 3(a) & 3(b) x_initial = 0; y_initial = 0; [M N] = size(F); for ii=1:M for jj=1:N x_initial = x_initial + x_axis(ii) * F(ii,jj); y_initial = y_initial + y_axis(jj) * F(ii,jj); end end initial_guess = [ x_initial y_initial ] ./ sum(F(:)); function term = gaussian(xy, axis_pts, sigma_term) % % GAUSSIAN - evaluates 2-dimensional gaussian given by equation in Fig. 4 term = 1.0; for dim=1:2 term = term * exp(-(xy(dim)-axis_pts(dim)) * (xy(dim)-axis_pts(dim)) / (sigma_term)); end

#### Listing Two

function fixed_pt_iteration_2 % FIXED_PT_ITERATION_2 optimized fixed-point iteration num_it = 100; % # of iterations num_vars = 2; sigma_term = .125; load variables.mat; % get F, x & y axes [M N] = size(F); % form x_grid by tiling x' along column dimension x_grid = repmat(x_axis', [1 N]); % same for y dimension by except tile along the row dimension y_grid = repmat(y_axis, [M 1]); % get starting point for fixed-pt iteration fixed_pt_solution = get_initial_guess(x_axis, y_axis, F); for n=1:num_it x_approx = repmat(fixed_pt_solution(1),[9 9]); y_approx = repmat(fixed_pt_solution(2),[9 9]); x_exp = exp( -((x_approx-x_grid).*(x_approx-x_grid))./sigma_term ); y_exp = exp( -((y_approx-y_grid).*(y_approx-y_grid))./sigma_term ); term1 = F .* x_exp .* y_exp; denominator = sum(term1(:)); term2_x = x_grid .* term1; term2_y = y_grid .* term1; numerator = [ sum(term2_x(:)) sum(term2_y(:)) ]; fixed_pt_solution = numerator./denominator; end disp(sprintf('Answer is (%f,%f)',fixed_pt_solution(1),fixed_pt_solution(2))); function initial_guess = get_initial_guess(x_axis, y_axis, F) % % GET_INITIAL_GUESS - returns starting point for fixed point iteration % evaluates expression given by 3(a) & 3(b) % x_initial = 0; y_initial = 0; [M N] = size(F); for ii=1:M for jj=1:N x_initial = x_initial + x_axis(ii) * F(ii,jj); y_initial = y_initial + y_axis(jj) * F(ii,jj); end end initial_guess = [ x_initial y_initial ] ./ sum(F(:));

#### Listing Three

function profile_fixed_pt(which_implementation, N) % PROFILE_FIXED_PT demonstrates use of Matlab profiler % LOAD_DRR(which_implementation, N) runs the function indicated % by which_implementation N times. % Use '1' for fixed_pt_iteration_1 (slow implementation) % '2' for fixed_pt_iteration_2 (optimized implementation) if which_implementation == 1 fun_str = 'fixed_pt_iteration_1'; elseif which_implementation == 2 fun_str = 'fixed_pt_iteration_2'; else error('Must specify 1 or 2'); end profile on; for ii=1:N eval(fun_str); end profile off; profile report;