Shehrzad is a software engineer for picoLiter Inc. He can be contacted at firstname.lastname@example.org.
Matlab is a programming language and IDE that I find ideal for development of numerical algorithms. However, programmers used to C-style arrays and C programming will need to make mindset adjustments when coding efficient algorithms in the Matlab programming language. In particular, the use of vector operations is absolutely tantamount to ensure optimal performance. In this article, I share my experiences in implementing numerical methods in Matlab, describe how I transformed the code from a C/C++-style algorithm into vectorized Matlab code, and show the performance gains resulting from this transformation. The complete source code (in Matlab .m files) illustrating these points is available electronically; see "Resource Center," page 5.
The solution to a system of nonlinear equations can be approximated through the use of iterative numerical techniques. A system such as this has the form of Figure 1(a). This system can be transformed by solving the ith equation for xi, obtaining Figure 1(b).
One method for solving systems such as this is known as "fixed-point iteration" (also known as "successive approximation"). Given an initial guess to the solution, this scheme works by successively feeding refined estimates to the solution back into the system. The generated sequence should eventually converge to the solution. Of course, the system in question must satisfy certain prerequisites, such as continuity and differentiability, to name two. (For a more in-depth discussion of this numerical technique, see Numerical Analysis, Fifth Edition, by Richard L. Burden and J. Douglas Faires, Brooks/Cole, 1993.)
Recently, I came across just such a system that needed to be solved in a "quasi real-time" fashion; that is, it needed to run as fast as possible. I typically use Matlab from The MathWorks (http://www.mathworks.com/) to prototype numerical algorithms. The system in question had two free variables; call them x and y. The system is described in Figures 2(a) and 2(b), where Fij=element of an NXM matrix represents a sampled function; xi=element of a 1XN vector represents the x-axis, and yi=element of a 1XM vector represents the y-axis.
To solve for x and y, I utilized the fixed-point iteration scheme, using the initial guess in Figures 3(a) and 3(b). Given F, x, and y, my first cut at a Matlab implementation was coded up in a C-style fashion, forgoing the use of any vectorized operations. Listing One presents the M-file fixed_pt_iteration_1.m. Typically, you use the difference between two successive iterations as a stopping condition. That is, if the difference between the current expression for the approximate solution and the result of the earlier iteration is less than some tolerance, the iterative procedure terminates. Since I am really interested in code optimization and profiling, for the purpose of this discussion, I simply let the algorithm iterate 100 times.
One of the first operations this function performs is to use the Matlab load command to read from the file variables.mat the three Matlab variables: F, x_axis, and y_axis.
- F is a 9X9 matrix representing a sampled function of two variables.
- x_axis is a 1X9 vector representing the "row dimension" of F.
- y_axis is a 1X9 vector representing the "column dimension" of F.
Next, the starting point is computed, and fixed-point iteration can then begin in earnest. The doubly nested loop continues evaluating the summation expressions, given by the equations in Figures 2(a) and 2(b). The subfunction gaussian evaluates the two-dimensional gaussian expression in Figure 4.
This implementation looks just like a line-by-line translation from a C snippet of code. Nowhere does it take advantage of any Matlab vector operations, which can greatly impact total computational time.
An Optimized Implementation
Listing Two is my optimized Matlab code. The fixed_pt_iteration_2 function starts out identical to the first version (the computation of the initial guess can be optimized as well, but since the performance bottleneck is in the main loop, I focus attention there). The first thing to notice is the elimination of the double for loop. The previous implementation cycled through each term of the summation expression given by Figures 2(a) and 2(b) one by one, continually incrementing the numerator and denominator variables. This is analogous to the += operator in C. What happens in the Matlab-optimized function is that matrices are formed, which are then multiplied together in an element-by-element fashion (not matrix multiplication), and finally the Matlab sum function is used to add up all of their elements.
The gaussian subfunction is no longer needed, as the exponential expressions are now evaluated in a vectorized fashion (the built-in Matlab function exp, like just about every other Matlab function, lets you pass in a matrix as well as a scalar as an input argument). To compute the expression given by the equation in Figure 4 without the use of for loops, matrices are formed using the Matlab function repmat. This function replicates a matrix in a tiling fashion. The matrix variables x_grid and y_grid replace the variables x_axis and y_axis. Likewise, the variable that stored the current approximate solution (fixed_pt_solution) has been replaced with the two matrices x_approx and y_approx. x_approx and y_approx are nothing more than 9X9 matrices consisting of the current approximated values of x and y, respectively. With these variables in place, evaluation of the summation terms present in the equations in Figures 2(a), 2(b), 3(a), and 3(b) is straightforward. The matrices are multiplied together in an element-by-element fashion which is very different from matrix multiplication. Hence, the use of the ".*" operator is important; see Figures 5(a) and 5(b), an illustration of the differences between the "*" and ".*" operators.
Once the exponential terms have been evaluated, the sum function adds up the elements of the resultant matrices, the approximated values of x and y are updated accordingly, and the fixed-point iteration begins anew.
To illustrate the performance gains that fixed_pt_iteration_2 provides, I use the Matlab profile function. This function, as its name suggests, provides profiling data. I have also provided a simple helper function (given an integer, either 1 or 2, indicating which function to run and the number of times to run the function) that calls the desired function N times, and at the end, spits out instrumentation information. Tables 1(a) and 1(b) are example runs on my aging 400-MHz PII laptop for both functions. What is astounding is that on my platform, fixed_pt_iteration_2 was on the average 20 times faster than fixed_pt_iteration_1.
This example speaks volumes for the importance of avoiding for loops wherever possible when programming with Matlab. Matlab makes extensive use of BLAS routines, which are highly tuned toward an individual machine's memory hierarchies. As such, vector operations are always preferred over for loops. In fact, the performance gains resulting from usage of these vector operations are so significant that they often can drive the algorithm selection. For example, the fixed-point iteration algorithm described in this article technically is an example of Jacobi fixed-point iteration. As we have seen, this algorithm is easily vectorized. An alternative method, Gauss-Seidel iteration, can be theoretically shown to provide faster rates of convergence than Jacobi iteration. However, because Gauss-Seidel iteration cannot utilize vector arithmetic, the total computation time is actually much longer (see Numerical Methods with MATLAB, by G.J. Borse, PWS Publishing, 1997). Examples such as this abound in numerical analysis and scientific programming. For C programmers such as myself, vector code may appear strange at first, but clearly, the performance gains are significant.
Thanks to my colleague Mohan Bodduluri for proof reading a draft of this article, and the fine people at Accuray who gave me the opportunity to explore interesting subject matters such as this when I worked there.
function fixed_pt_iteration_1 % FIXED_PT_ITERATION_1 un-optimized fixed-point iteration num_it = 100; % # of iterations num_vars = 2; sigma_term = .125; load variables.mat; % get F, x & y axes % get starting point for fixed-pt iteration fixed_pt_solution = get_initial_guess(x_axis, y_axis, F); length_x_axis = length(x_axis); length_y_axis = length(y_axis); for n=1:num_it denominator = 0; % denominator of expression given by equations 2(a) & 2(b) numerator = [0 0]; % numerator of expression given by equations 2(a) & 2(b) for ii = 1:length(x_axis) for jj = 1:length(y_axis) xy = fixed_pt_solution; % [ xi yi ] axis_pts = [x_axis(ii) y_axis(jj)]; term = F(ii,jj)*gaussian(xy, axis_pts, sigma_term); denominator = denominator + term; numerator(1) = numerator(1) + x_axis(ii) * term; numerator(2) = numerator(2) + y_axis(jj) * term; end end % update refined answer fixed_pt_solution = numerator./denominator; end disp(sprintf('Answer is (%f,%f)',fixed_pt_solution(1),fixed_pt_solution(2))); function initial_guess = get_initial_guess(x_axis, y_axis, F) % % GET_INITIAL_GUESS - returns starting point for fixed point iteration % evaluates expression given by 3(a) & 3(b) x_initial = 0; y_initial = 0; [M N] = size(F); for ii=1:M for jj=1:N x_initial = x_initial + x_axis(ii) * F(ii,jj); y_initial = y_initial + y_axis(jj) * F(ii,jj); end end initial_guess = [ x_initial y_initial ] ./ sum(F(:)); function term = gaussian(xy, axis_pts, sigma_term) % % GAUSSIAN - evaluates 2-dimensional gaussian given by equation in Fig. 4 term = 1.0; for dim=1:2 term = term * exp(-(xy(dim)-axis_pts(dim)) * (xy(dim)-axis_pts(dim)) / (sigma_term)); end
function fixed_pt_iteration_2 % FIXED_PT_ITERATION_2 optimized fixed-point iteration num_it = 100; % # of iterations num_vars = 2; sigma_term = .125; load variables.mat; % get F, x & y axes [M N] = size(F); % form x_grid by tiling x' along column dimension x_grid = repmat(x_axis', [1 N]); % same for y dimension by except tile along the row dimension y_grid = repmat(y_axis, [M 1]); % get starting point for fixed-pt iteration fixed_pt_solution = get_initial_guess(x_axis, y_axis, F); for n=1:num_it x_approx = repmat(fixed_pt_solution(1),[9 9]); y_approx = repmat(fixed_pt_solution(2),[9 9]); x_exp = exp( -((x_approx-x_grid).*(x_approx-x_grid))./sigma_term ); y_exp = exp( -((y_approx-y_grid).*(y_approx-y_grid))./sigma_term ); term1 = F .* x_exp .* y_exp; denominator = sum(term1(:)); term2_x = x_grid .* term1; term2_y = y_grid .* term1; numerator = [ sum(term2_x(:)) sum(term2_y(:)) ]; fixed_pt_solution = numerator./denominator; end disp(sprintf('Answer is (%f,%f)',fixed_pt_solution(1),fixed_pt_solution(2))); function initial_guess = get_initial_guess(x_axis, y_axis, F) % % GET_INITIAL_GUESS - returns starting point for fixed point iteration % evaluates expression given by 3(a) & 3(b) % x_initial = 0; y_initial = 0; [M N] = size(F); for ii=1:M for jj=1:N x_initial = x_initial + x_axis(ii) * F(ii,jj); y_initial = y_initial + y_axis(jj) * F(ii,jj); end end initial_guess = [ x_initial y_initial ] ./ sum(F(:));
function profile_fixed_pt(which_implementation, N) % PROFILE_FIXED_PT demonstrates use of Matlab profiler % LOAD_DRR(which_implementation, N) runs the function indicated % by which_implementation N times. % Use '1' for fixed_pt_iteration_1 (slow implementation) % '2' for fixed_pt_iteration_2 (optimized implementation) if which_implementation == 1 fun_str = 'fixed_pt_iteration_1'; elseif which_implementation == 2 fun_str = 'fixed_pt_iteration_2'; else error('Must specify 1 or 2'); end profile on; for ii=1:N eval(fun_str); end profile off; profile report;