Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Tools

Neural Networks and Character Recognition


JUN93: NEURAL NETWORKS AND CHARACTER RECOGNITION

NEURAL NETWORKS AND CHARACTER RECOGNITION

High-level, interactive tools are less complex but just as powerful

Ken Karnofsky

Ken is market-segment manager for signal-processing and neural-network tools at Math Works. He's also worked in the areas of speech-recognition research, interactive speech-processing software, and data analysis for industrial applications. Ken can be contacted at the Math Works, Cochituate Place, 24 Prime Park Way, Natick, MA 01760.


Neural-network technology is being applied to solve a wide variety of scientific, engineering, and business problems and perform complex functions such as noise cancellation, adaptive filtering, pattern recognition, non-linear controls, and econometric forecasting. Neural networks are composed of many simple interconnecting elements, or neurons, working in parallel to solve a problem. Like biological nervous systems, artificial neural nets can be trained to find their own solutions, based solely on the form of interconnections and the data presented to the network. In contrast to classical approaches in fields such as statistics and control theory, neural nets require no explicit model or limiting assumptions of normality or linearity. This property is useful where system complexity makes formal analysis extremely difficult or impossible. The ability to adapt to incoming data conditions make neural networks particularly adept at handling noisy or rapidly changing systems.

Neural-net research and development has traditionally been carried out using languages such a C, C++, Fortran, or Lisp. In recent years, however, commercial software tools have used pre-existing algorithms to simplify network development. In most cases, modifying algorithms or integration with non-neural systems continues to require advanced programming techniques.

In this article, I present an overview of neural-network properties, then examine an application that illustrates them using the Matlab Neural Network Toolbox, an add-on that facilitates the design and simulation of many kinds of neural networks. Matlab (short for "matrix laboratory") is a technical computing environment that provides a high-performance, interactive system for numerical computation and visualization. Matlab, which runs on platforms ranging from Microsoft Windows to the X Window System, integrates matrix computation, numerical analysis, data analysis, and graphics with a language that expresses problems and solutions as they are written mathematically--without traditional programming.

Matlab's basic element is a matrix that doesn't require dimensioning. Along with optimized numerical routines and a fast interpreter, this allows many numerical problems to be solved in a fraction of the time that it would take to write a program in Fortran or C. Applications written in Matlab can draw on high-level programming constructs and a library of over 500 numerical, analytical, and graphics functions. Application-specific add-on libraries ("toolboxes") of Matlab functions can be used to solve particular classes of problems in the areas of signal processing, control-system design, and optimization as well as neural nets.

By drawing on Matlab's computation features and matrix-oriented, high-level language, the Neural Network Toolbox handles networks efficiently using a matrix representation of network models. Those models may be easily compared to, and interact with, more traditional problem-solving approaches. In addition, a concise graphical notation simplifies the description and understanding of network architectures.

Neuron Model and Neural-network Architectures

The neuron model and the architecture of a neural net determine how a network transforms its input into an output. This transformation can be viewed as a computation. Both the neuron model and the network architecture each place limitations on what a particular neural network can compute.

In the simplest case, a single input neuron, the input p is transmitted through a connection that multiplies its strength by the weight w, to form the product w*p. This product is the argument to the transfer function F, which produces an output a. The neuron may also have a bias, b, which is simply added to the product w*p; therefore, the net input is the sum of the weighted input w*p and the bias b. The complete neuron model is a=F(w*p+b). Note that w and b are both adjustable parameters of the neuron.

In general, each neuron may have multiple inputs, and any number of neurons may operate in parallel to form a layer. Figure 1 illustrates the components of a neuron layer: the input, the weights, the bias, and the transfer function, and the output. In the Matlab Neural Network Toolbox, the inputs, outputs, and bias are represented as vectors, and the weights are represented as a matrix (n inputs by m neurons). The resulting matrix equation, shown in, Figure 1, provides a concise notation for neural networks. Because Matlab is optimized for matrix manipulation, it provides a natural foundation for efficient implementation of network computations.

Transfer functions may be linear or nonlinear, according to the purpose of the net; typically, they limit the output to a desired numerical range. For example, by limiting output to 0 or 1, the step function is useful for classification decisions. Sigmoid functions are smooth, differentiable nonlinear functions that compress the input into the range of 0 to 1; see Figure 2. They can be used for function approximation and a variety of other tasks.

A network can have any number of layers, but networks with more than three are uncommon. In multilayer networks, transfer functions condition the output of one layer before passing it on to the next layer. This prevents the propagation of very large and very small numbers and the associated numerical problems.

In summary, the number of inputs, the number of neurons per layer, the number of layers, and the type of transfer function all affect what the network can compute. The application establishes the number of inputs and outputs but the number of neurons in the intermediate layers are up to the designer. In general, more neurons is more powerful, but at the expense of additional computational complexity.

Training

The weight and bias values to be used in a neural network are obtained by a process of training. Some neural nets are trained by presenting the network with a sequence of example input/output pairs. The net adjusts its parameters until it can produce output vectors that closely match the desired outputs, called target vectors, as closely as possible. Such nets are called supervised networks.

Networks trained in a supervised manner can be used to model and control both linear and nonlinear dynamic systems, to classify input vectors in the presence of noise, and for other useful tasks. The natural ability of neural networks to deal with nonlinearities allows them to solve complex problems. For example, a neural controller has been implemented by Derrick Nguyen and Bernard Widrow for backing a truck and trailer to a dock. It can perform this task successfully even when starting with the truck and trailer jackknifed and facing the wrong way.

Other neural nets are trained by presenting input vectors and letting the net adjust itself. Such networks are said to be unsupervised since no specific behavior is predefined. Instead, the network "discovers" the relationships within the data. Unsupervised networks can be used to find a classification scheme for data as it is presented. This classification scheme can then be used on future data.

Character Recognition

Optical character recognition has many practical applications, from check processing to converting faxed text into editable format on a computer. Such systems can be very cost effective, because they can save time, avoid errors, and relieve humans from performing repetitive tasks.

Statistical methods are typically used to model patterns and to compensate for noise corruption. In these systems, fine-tuning system performance can often mean exhaustive research to develop algorithms that handle real-world conditions. Frequently these algorithms are computationally intensive. With neural networks, on the other hand, once the basic requirements are understood, the experimentation and algorithm development is relatively straightforward. While the training may require complex computations, the execution is typically quite efficient.

In this example, the task is to recognize the 26 letters of the English alphabet. I'll simulate an imaging system that digitizes each letter centered in the system's field of vision, ideally resulting in a 5x7 grid of Boolean values. Figure 3(a) shows the letter G scanned under ideal conditions. However, a typical scanning system is not perfect; it often produces a noisy image of the letter, such as Figure 3(b).

The goal of the neural net is to correctly identify the image of each letter. The problem requires perfect classification of ideal inputs and reasonably accurate classification of noisy inputs. Specifically, the network should make as few mistakes as possible when classifying inputs with noise of mean 0 and standard deviation 0.2 or less.

The inputs are represented as a set of 26 vectors of 35 elements each (imagine each 5x7 grid "unrolled" into a column vector)--one for each letter. These vectors will in turn form the columns of an input matrix called "alphabet."

The output of the net is to be trained to match a set of target vectors that represent the positions of the letters of the alphabet. Each target vector is a 26-element vector with a 1 in the position of the letter it represents, and Os everywhere else. For example, the letter A is represented by a 1 in the first element (as A is the first letter of the alphabet), and 0s in elements 2 through 26.

In this example, the training and target vectors are generated artificially, which would be appropriate in the early prototyping stages of a system. As development progresses, you could easily substitute actual scanned images using Matlab's I/0 and image-processing capabilities.

Network Architecture and Implementation

Since it must map a 35-pixel image to one of the 26 letters in the alphabet, the neural net will need 35 inputs to represent the image grid, and 26 neurons in its output layer. Because two-layer nets are known to have better function-approximating properties, the network has an additional, hidden layer, with ten neurons. Picking this number uses experience and some guesswork. Too few neurons in the hidden layer can lead to underfitting, while too many can contribute to unstable oscillations between the fitted points. If the network has trouble learning, you can add neurons to this layer to improve the fit. Figure 4 is a diagram of this network.

For both layers, use the log-sigmoid transfer function. The output range (0,1) of the log-sigmoid transfer function is perfect for learning to output Boolean values. It is also well suited to handle the nonlinear relationship between the patterns in the input and output vectors.

The net will be trained to output a 1 in the correct position of the output vector and to fill the rest of the output vector with 0s. However, noisy input vectors may cause the network to produce outputs that are close to, but not exactly 1s and 0s. To handle this problem, the network output will be passed through a post-processing transfer function. You'll use the competitive transfer function, which outputs a 1 in the position of the largest value and 0s elsewhere. This will assure that the element representing the letter most like the noisy input vector takes on a value of 1 and all others have a value of 0. This post-processing produces the final, clean output that will actually be used in classification.

Implementation of the network with the Matlab Neural Network Toolbox is performed in three phases:

  • Initialization of network inputs and parameters. This is performed by assigning constants or the return value of built-in functions. External data can be easily imported, scaled, and otherwise preprocessed using Matlab's data-manipulation functions.
  • Network architecture specification and training. This is typically done with a call to a toolbox function that accepts the input vectors and the weights, biases, and transfer functions for each layer as arguments. Training data may be presented one set at a time, or multiple training sets may be batched together. Optional graphs help monitor the training progress.
  • Network simulation and testing. For each test case, a new input is presented as an argument to the network function, along with the trained weights and biases. Again, data can be acquired from external sources and preprocessed as part of the training or simulation procedure. Results may be visualized using built-in graphs or with custom plots devised by the user.
Each of these steps may be done interactively or automatically to explore alternative strategies, parameter values, and architectures. Once an approach has been selected, the same commands may be saved in a script file to automate procedures that will be used repeatedly.

Training the Network with Backpropagation

You train the network using the backpropagation learning rule. Backpropagation is perhaps the most commonly used supervised training technique because it produces networks that are capable of approximating any reasonable function. It has a nice generalization property that makes it possible to train a network on a representative set of input/target pairs and get good results for new inputs without training the network on all possible input/output pairs.

First, you initialize the weights and biases with constrained initial conditions, using the Nguyen-Widrow method, to improve the learning time; see Listing One, page 103. This method takes advantage of how a network approximates functions to speed convergence. Creating a net that handles noisy input vectors works best when training with both ideal and noisy vectors. Start out by training on ideal vectors until you've reached a low sum-squared error (the sum of the squared differences between the network targets and actual outputs).

After initialization, you apply the backpropagation technique to ideal vectors as follows. First, the 26x35-element input matrix P representing the alphabet is presented to the network. The net's overall output A2 is calculated in response to this vector. Next, the difference between the output and the target vector is calculated. This is the error of the network.

The derivative of the sum of the squared errors with respect to the second layer's output vector is calculated. This, in turn, is used to calculate the derivative of sum-squared error with respect to the hidden layer's outputs. Propagating the derivatives of error backward through the network is the origin of the term "backpropagation."

The backpropagation learning rule adjusts the weights and biases of the network so as to minimize the sum-squared error. This is done by iteratively changing the values of the network weights and biases in the direction of steepest descent with respect to error, effectively reducing the error by a small amount each time. This is called a gradient descent procedure. Changes in each weight and bias are proportional to that element's effect on the sum-squared error of the network. Once the sum-squared error of the network has reached an acceptably small value, or a maximum number of iterations are performed, the training is complete.

Among the known pitfalls of backpropagation are false-error minima and excessive training time. These problems can be overcome with techniques provided in the Neural Network Toolbox. Adding a momentum factor decreases backpropagation's sensitivity to small variations in the error surface. This helps the net avoid mistaking shallow-local minima for the true lowest-error solution. Training time can be reduced by applying the Nguyen-Widrow initial conditions, as mentioned earlier. Another approach is to use an adaptive learning rate to increase the learning rate without sacrificing stability.

Training Without Noise

The network is initially trained without noise for a maximum of 5000 epochs (training iterations) or until the network sum-squared error falls beneath 0.1. The row vector TP contains training parameters such as the acceptable error, learning rate, and so on. These values are useful for adjusting a network's ability to handle noise.

Listing Two (page 103) implements the training procedure, using the Neural Network Toolbox function trainbpx, which uses both adaptive learning rate and momentum to improve the training procedure.

Adding Noise

Next, you train the network on ten sets of ideal and noisy vectors. Each set has two copies of the noise-free alphabet and two noisy vectors. The target vectors consist of four copies of the vectors in the 26x26 matrix target. The noisy vectors have noise of mean 0 and standard deviation 0.1 and 0.2 added to them. This forces the neurons to learn how to properly identify noisy letters, while requiring that it can still respond well to ideal vectors. To train with noise, the maximum number of epochs is reduced to 300 and the error goal is increased to 0.6, to reflect the higher expected error. See Listing Three (page 103).

Unfortunately, after the training described above, the net learns to classify some difficult noisy vectors at the expense of properly classifying a noise-free vector. So you train the network again on just ideal vectors. This will help ensure that the network will respond perfectly when presented with an ideal letter.

The values (W1, B1, W2, and B2) returned by the function trainbpx represent the final parameters of the network you'll actually use.

Simulating Network Operation

You measure the reliability of the neural-net pattern-recognition system by testing the network with hundreds of input vectors with varying quantities of noise. Noise with a mean of 0 and standard deviation from 0 to 0.5 are added to input vectors.

At each noise level 100 presentations of different noisy versions of each letter are made and the net's output is calculated. The output is then passed through the competitive transfer function, which returns a 1 in the position of the largest element, and Os elsewhere. As a result, only one of the 26 outputs representing the letters of the alphabet, has a value of 1.

Listing Four (page 103) shows the code for such a test on one letter (M), using noise with standard deviation of 0.2. Figure 5 displays the system performance as a function of noise for each of the two networks. The solid line on the graph shows the reliability for the network trained with noise. The reliability of the same network trained without noise is shown with a dotted line. As you can see, training the network on noisy input vectors significantly reduced its error rate in classifying noisy vectors. The network trained with noise did not make any errors for vectors with noise of standard deviation 0.00 or 0.05. When noise of standard deviation 0.2 was added to the vectors, the network made a mistake 3.62 percent of the time.

If you require higher accuracy, the net could be trained for a longer time or retrained with more neurons in its hidden layer by changing the values of the appropriate variables. Also, the resolution of the input vectors could be increased to say, a 10x14 grid. Finally, the network could be trained on input vectors with greater amounts of noise if greater reliability was needed for higher levels of noise.

Summary

This article has examined how a simple pattern-recognition system can be designed and simulated using neural networks. Training the network on different sets of noisy vectors forced it to learn how to deal with noise, a common problem in pattern-recognition applications.

The backpropagation technique used here has been proven to be quite versatile. Matlab users have used it in medical-image processing, geoscience, control-systems design, and other fields where the ability to handle noise and nonlinearities are important. A typical application is the control of an antenna to track a moving signal.

However, different network architectures might be more appropriate for other applications. For example, associative networks produce associations between inputs and neurons that are more localized than backpropagation networks; self-organizing networks adapt to correlations or irregularities in the inputs; and recurrent networks contain a feedback element that makes it possible to model delays or effects that vary with time. A neural-network tool should provide all of these network types.



_NEURAL NETWORKS AND CHARACTER RECOGNITION_
by Ken Karnofsky


[LISTING ONE]
<a name="0187_000e">

R = 35;             % number of inputs
S1 = 10;            % number of neurons in layer 1
S2 = 26;            % number of neurons in layer 2
[W1,B1] = nwlog(S1,R);  % initial conditions for layer 1
W2 = rands(S2,S1)*0.01; % initial conditions





<a name="0187_000f">
<a name="0187_0010">
[LISTING TWO]
<a name="0187_0010">

P = alphabet;                 % input matrix - 26x35
T = targets;                  % target matrix - 26x26
disp_freq = 20;               % define training parameters
max_epoch = 5000;         %         "
err_goal = 0.1;               %         "
lr = 0.01;                %         "
lr_inc = 1.05;                %         "
lr_dec = 0.7;                 %         "
momentum = 0.95;              %         "
err_ratio = 1.04;             %         "

% All training parameters are packed into the vector TP
TP = [disp_freq, max_epoch, err_goal, lr, lr_inc, lr_dec, momentum, err_ratio];

% The function trainbpx trains the network using the specified initial
%   conditions and transfer functions. It returns the trained network
%   weights and biases, the number of epochs (iterations) required, and a
%   record of errors and learning rate at each epoch.
[W1,B1,W2,B2,epochs,TR] =  trainbpx(W1,B1,UlogsigU,W2,B2,UlogsigU,P,T,TP);






<a name="0187_0011">
<a name="0187_0012">
[LISTING THREE]
<a name="0187_0012">


max_epoch = 300;                       % define training parameters
err_goal = 0.6;                            %         "
TP = [disp_freq ... AS ABOVE ... err_ratio];
for pass = 1:10,
 P = [alphabet, alphabet, ...              % clean training vectors
      (alphabet + rand(R,Q)*0.1), ...      % and with varying
      (alphabet + rand(R,Q)*0.2)];         % amounts of noise
 T = [targets, targets, targets, targets]; % 4 target vectors batched together
 [W1,B1,W2,B2,TE,TR] = ...                 % training procedure
     trainbpx(W1,B1,UlogsigU,W2,B2,UlogsigU,P,T,TP);
end






<a name="0187_0013">
<a name="0187_0014">
[LISTING FOUR]
<a name="0187_0014">

% Define an input example with random noise added.
noisyM = alphabet(:,13) + randn(35,1)*0.2;

% Output A2 is a function of the input vector, the transfer function for each
%  layer, and the trained weights and biases.
A2 = logsig(W2*logsig(W1*noisyM,B1),B2);

% Because A2 may be noisy (not exactly 1 or 0); the competitive function
%   picks the element closest to one.  The FIND function returns its index.
answer = find(compet(A2) == 1);














Copyright © 1993, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.