Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Tools

Your Own Handprinting Recognition Engine


APR92: YOUR OWN HANDPRINTING RECOGNITION ENGINE

YOUR OWN HANDPRINTING RECOGNITION ENGINE

Algorithms for putting pen to pad

This article contains the following executables: HANDPRINT.URC

Ron Avitzur

Ron completed undergraduate physics at Stanford in 1990. While there, he developed Milo and the math engine used by FrameMaker, the only page-layout program which can symbolically evaluate derivatives. He can be reached at P.O. Box 6692, Stanford, CA 94309.


This article discusses the design and implementation of a writer-dependent recognizer for handprinted text. This recognition engine forms the basis of a pen-based interface to a symbolic math program. My recognizer is distinguished by its small size and straightforward implementation, which is nevertheless able to achieve high character accuracy. The program currently runs on the Macintosh, but the core code is highly portable and platform independent.

History of a Communications Bottleneck

When I was an undergraduate doing physics problem sets, I had to generate pages and pages of simple, but tedious, algebra--by hand. I found that the symbolic mathematics programs--such as Macsyma, Reduce, and SMP--were largely useless in handling students' simple problems. Although symbolic math programs can do powerful computations instantaneously, their command languages are designed around the limitations of 30-year-old teletype terminals. The whole process of typing formulas, interpreting results, and producing finished documents took longer by computer than by hand.

While in school, I explored these limitations by developing a microcomputer math program with a mouse-based user interface tailor-made for my needs (a Macintosh program called "Milo"). But when I returned to my studies, I discovered that I could still work faster by hand, because communicating with the computer through a narrow input pipe is much too slow--even with a mouse assisting the keyboard input. You can write formulas faster by hand because mathematical notation is intrinsically two-dimensional, and because many different symbols are required (more than there are keys on the keyboard or convenient choices on a menu bar).

I determined that the problem could be solved using a pen-based interface, and so I wrote a gesture recognizer and built it into an experimental version of Milo. While this itself is not a product, I do license the core engine (which handles mathematical entry, editing, type-setting, and computation) for use as an embedded component inside other applications.

Design Constraints

A primary characteristic of my recognizer is that it is designed for interactive programs. This means it has to work fast--at the same time that the user is moving the stylus across the digitizer. It also means that I can rely on the dynamics (stroke order) as well as the statics (point positions) of handprinted characters. (In this context, "handprinted" refers to distinct, separate characters, as opposed to connected, "handwritten" characters in cursive.)

Another issue is that a recognition system suitable for math has different constraints than one for English words. The symbol set for mathematics consists of hundreds of characters, rather than just a few dozen. There are many special symbols, including Greek characters and operator symbols. Recognizers designed for standard alphabetic text increase word accuracy by relying on context and dictionary lookup. Math formulas do not have words, per se, and require a high character accuracy. Fortunately, math notation is printed, not written in cursive, and is therefore easier to recognize than handwritten text.

To better handle large character sets, the recognizer described here is writer dependent--that is, the writer must both train the system and adapt to it. The training process stores a number of features for each character in the set. During recognition, these stored features are matched against the features of the character at hand. One feature of the recognizer is that it not only maps user-defined shapes to characters in a text or math font, but that it can also map shapes to arbitrary function codes. This allows the system to support gestures for operations such as simplify, delete, and move.

Another important design goal was to keep the implementation short and simple. Finding bugs in small amount of code is hard enough--imagine how it is with a large recognizer. Because recognizers are evaluated on a statistical basis (such as x percent accuracy), it is possible for subtle bugs to remain hidden for a long time, reducing accuracy without crashing the program.

A related and important design goal is that of inspectability--of code, data, and algorithms. I can't stress this enough. Many of the bugs I discovered during development were subtle. They did not cause crashes or seriously degrade performance, but they made it impossible to improve performance because I did not understand which data values were actually being computed for each stroke. I then settled on a scheme which maintains a human-readable representation (ASCII) of the 32-bit hash values that characterize each stroke. Although slightly less efficient in time and space, this makes it possible to inspect the character data while the program is running, and thus understand and verify the recognition process at each step in its execution. Figure 1 shows some sample data describing the character O. The meaning of these strings will be explained in the next section.

Figure 1: A text dump of feature data for the letter O

  b: WSENW NWSENENW
  c: DCBA ADCBA ADCBAD ADBA
  d: 7654321 07654321 1076543210 076543210 1764321

  e: 101232 21012321 210123210 2101231 2101232 210232
  f: 0123210 01320

The Recognition Process

Recognition begins immediately after a pen-up event. To allow multistroke letters, segmentation (dividing the input stream into distinct strokes) is based on timing. If a stroke (that is, the path between pen-down and pen-up; an array of x,y points) is recognized unambiguously, the system immediately displays the chosen character. If a stroke can correspond both to a single-stroke character and to the beginning of a multistroke character, the system waits for one-sixth of a second before returning a guess. If a new stroke starts before the time-out, the recognizer considers both strokes to be part of a single letter. For example, a hyphen (-) will be recognized unless the writer continues to mark, say, an equal sign (=) or a plus sign (+) during the time-out interval. An eight (8), on the other hand, will be recognized immediately after pen-up.

Ink is collected as fast as the tablet allows. Each stroke may contain from 10 to 100 points. The first step is a preprocessor that throws away most of these points in order to speed up subsequent calculations. I experimented with many preprocessing methods. In the end, I chose the fastest and simplest, because profiling showed that 50 percent of the time was spent in preprocessing. Also, I was spending too much debugging time on that portion of the code. The method I used for developing this routine was to write a letter in a window and see all the ink, then filter those points out and have the program immediately display the letter using only the points remaining. If I could still clearly recognize the letter by eye, I wasn't throwing away too much information. The routine Process3( ) copies the array of points from Ink[] into the array P[], ignoring any points less than Height/8 (in the y direction) and Width/8 (in the x direction) away from the previous point. This filtering step is shown in the top part of Figure 2. The lower part of the figure shows the result of the feature-extraction process on the sample letter O.

Every character is described by five different features: three direction lists and two position lists. The features are calculated by what I call "hash functions," shown numbered 1 to 5 in Figure 2.

The first feature, a direction list, consists of a string that describes the path taken by the strokes comprising a character. For example, the letter O can be written as a counterclockwise circle, which would be described by the string WSENW. This means that the pentip generally begins moving to the west (+/-45 degrees), then south, and so on.

More Details.

The second direction list is a string that partitions the input space into four regions, rotated 45 degrees from compass directions. The third direction list is a string partitioning the input space into eight compass directions.

These three direction lists are packed into a 32-bit integer so that searching and matching can be done quickly. Four possible values (such as N, E, S, and W) can be encoded in 2 bits. A 32-bit integer can therefore contain up to 16 encoded values, in the case of the first two direction lists. The third direction list requires 3 bits per character to encode the eight possible compass directions. This means that up to ten possible values can be stored in a 32-bit integer. This encoding limits the complexity of strokes that can be stored and distinguished.

The recognizer will look at each of these hash values and generate a list of possible matches. Each function gets a weighted vote and can add multiple candidates to the list of possible matches. Rather than compare each stroke with each trained pattern, the training data is sorted by hash value so that a binary search quickly finds matches.

Direction information is almost sufficient, but not quite. Direction alone cannot distinguish between h and n, for example, or b and p, or 6 and 0 in my handwriting. During development, I initially looked only at the first hash function and achieved about 70 percent accuracy. The other direction functions fixed some problems but added mostly redundant information. Looking at the positions brought down the error rate to under 10 percent. So after a list of possible candidates are chosen and voted upon by the direction functions, the two position functions vote.

The Code

The code for the recognition engine, including a simple shell that allows for testing and training, consists of six files. Due to space limitations, only the most important excerpts are shown in the accompanying listing. The complete code is available electronically from DDJ in the form of a Lightspeed C 5.0 project for the Macintosh. Most of the files are system independent, using ANSI-C libraries for memory management and file I/O.

While the code will work with a mouse, it works better with a digitizing tablet. I developed the code using a Wacom 510C digitizer (Wacom Technology, Vancouver, Wash.), which has a cordless pen, 500 lines-per-inch resolution, and a transfer rate of 120 points per second. The digitizer is connected through a 9600 baud serial port. The system also works well with Wacom's HD-648A LCD integrated tablet that also has a cordless pen and comparable resolution.

Here's a summary of the routines available electronically. The file TestShell.c contains main( ), which calls the usual Macintosh initialization routines, opens a window, and goes into the main event loop. The module FileIO.c consists of two routines, ReadPatterns( ) and SavePatterns( ), which read and write a user's handwriting data to disk. The file RecognizerUtil.c contains miscellaneous routines for initializing the tablet, defining an ATAN2 table (to cut down on floating-point usage), and manipulating the List data structures. The file Trainer.c contains DoFastTrainer-Dialog( ), which brings up the modal dialog (shown in Figure 3) and adds each letter trained to the "patterns" global variable. To bring up the training dialogue in the runtime shell, you must type command-L.

If you want to compile the recognizer code to run on another platform, such as Microsoft Windows, you'll need to change the routines which initialize the tablet (in RecognizerUtil.c), track the mouse (in Analyzer.c), set up the windows (in TestShell.c), and handle the training dialog (in Trainer.c).

The heart of the computation is in two files, Analyzer.c and Recognizer.c. The module Analyzer.c is responsible for collecting ink and reducing the data stream to hash values. The routines in Recognizer.c compare the calculated hash values to a stored list of patterns. Listing One (page 103) contains the principal routines from analyzer.c, as well as the most important data structures.

The StrokeData structure is where all the action takes place. During pen movement, points are accumulated into the member field Ink[]. On pen-up, the routine Simplify( ) filters out most of the points and collects about 10 or 20 into member field P[]. Then Analyze( ) fills in the StrokeData fields. Member field T[] is a list of directions (-180..180) between the points in P[]. Xmax, Xmin, Ymax, and Ymin together define the bounding box for this stroke. Similarly, XmaxT, XminT, YmaxT, and YminT define the cumulative bounding box--the union of bounding boxes for this stroke and all previous strokes in this letter. IsDot is a Boolean special case that is set true if the bounding box is smaller than a threshold. The array S[5] contains the feature set--the five direction and position lists described earlier.

After these fields are computed, the matches list is filled with guesses. For the first stroke of a letter, the features S1, S2, and S3 are compared to all the letters in patterns; weighted guesses are then kept in the matches field of that stroke. For further strokes in a letter, we look only at letters proposed in the matches field of the previous stroke.

Future Improvements

Because my interest is in math systems, I developed this only until I was convinced the problem is solvable. To proceed further, I would suggest writing an ink collector to store many samples of each letter, allowing the user the view the ink directly and delete mistakes. Then one could experiment with different kinds of functions by training on half the strokes in the ink set and testing on the other half.

To improve accuracy, various strategies come to mind. You could define other character features and provide the hash functions to calculate them. The current implementation relies on position and its first derivative (angle). A natural extension is to look at the integral of the curve (area) or its second derivative (curvature). The aspect ratio of the bounding box would make a useful single number to include in the weighting scheme. The direction functions are sensitive to rotation, so it would work better to subtract out the initial angle when computing the string, and use the initial angle as a separate number. I would suggest trying functions which compare points separated along the curve, because these functions examine only local information.

Try Your Hand at Recognition

The handprint recognizer presented in Ron Avitzur's article does a lot with a small amount of code. The implementation is fast and straightforward, and works surprisingly well in practice. But here's an invitation to readers to see if you can do better.

To this end, we're proud to announce the first ever DDJ Recognition Contest.

We're just now in the process of building a general-purpose test harness to exercise your code. This test harness will be implemented on both the Macintosh and Windows 3 platforms. The API that your recognizer uses to run on this harness will likely consist of one function call--with "ink" (stylus data points) passed as input, and a simple text string as output.

The test harness will run each recognition engine over the same set of stored inputs. For this purpose, the folks at GO Corp. have generously offered their extensive database of handwriting samples. We're now in the process of establishing a standard data format for ink samples, in case you want to implement your own recognition engine from scratch.

If you're interested in this project, please contact us via electronic or regular mail for full details. We'll also publish more on this contest in our next issue, and establish an end-of-summer deadline for your contest entries. Winners will be chosen at the end of this year, and we'll publish excerpts from the winning entries. The complete code will, of course, be available online.

Programmers, start your engines!

--editors


_YOUR OWN HANDPRINTING RECOGNITION ENGINE_
by Ron Avitzur



[LISTING ONE]
<a name="00c4_000d">

/*****************************************************************
A Writer-Dependent Hand-printing Recognizer -- by Ron Avitzur, 1991.
This is not the complete code. See the accompanying article for
a description of other necessary modules.
*****************************************************************/

typedef struct { short num_items; void  *items[]; } *List;
typedef struct { short code;  List strokes; } GesturePattern
typedef struct { char is_dot; List s[5];    } StrokePattern;
typedef struct {
        Point    Ink[MAX_POINTS],
                 P[MAX_N];
        short    Ink_Num,
                 N,
                 T[MAX_N],
                 IsDot;
        unsigned long S[5];
        double   Aspect_Ratio;
        long     Xmax,Xmin,
                 Ymax,Ymin,
                 Height,Width,
                 XmaxT,XminT,
                 YmaxT,YminT,
                 HeightT,WidthT;
        long     start_time,end_time;
        List     matches;
        } StrokeData,
         *StrokePtr;

ProcPtr HashFunctions[] = { Fxn1,Fxn2,Fxn3,Fxn4,Fxn5 };

char Bits[] = {2,2,3,2,2};

#define DOT_THRESHHOLD              (Wacom?80:4)
#define PT_SEP_POST                 (Wacom?40:4)

/****************************************************************/
void Analyze(register StrokePtr theStroke) {
        char i,s[100];
        Simplify();
        for (i = 0; i < 5; i++) {
                HashFunction[i](s,theStroke);

                ConvertStringToLong(s,&S[i],Bits[i]);
                }
        theStroke->IsDot =
           ((theStroke->Height < DOT_THRESHHOLD
            &&
            theStroke->Width  < DOT_THRESHHOLD)
         || N == 1);
        }
/****************************************************************/
void    Simplify(StrokePtr theStroke,Point *Ink,short N) {
        Point Q[MAX_POINTS];
        short min_dx = theStroke->Width  / 8,
              min_dy = theStroke->Height / 8;
        if (theStroke->Aspect_Ratio < 0.2) min_dy = theStroke->Height;
        if (theStroke->Aspect_Ratio > 5.0) min_dx = theStroke->Width;
        theStroke->N = Process3(theStroke->P,Ink,N,min_dx,min_dy);
        ComputeT(theStroke);
        }
/****************************************************************/
void ComputeT(StrokePtr theStroke) {
        register short *T  = theStroke->T;
        register Point *P  = theStroke->P;
        register short i,N = theStroke->N;
        for (i = 0; i < N - 1; i++)
                T[i] = ATAN2(P[i+1].v-P[i].v,P[i+1].h-P[i].h);
        }
/****************************************************************/
short Process3(register Point *P, register Point *Q,
               short num,short xd, short yd) {
        register short i,n;
        register short dx,dy;
        n = 0;
        P[0] = Q[0];
        for (i = 1; i < num - 1; i++) {
                dx = Q[i].h - P[n].h; dx = ABS(dx);
                dy = Q[i].v - P[n].v; dy = ABS(dy);
                if (dx + dy < PT_SEP_POST)      continue;
                if (dx < xd && dy < yd)         continue;
                n++;
                P[n] = Q[i];
                }
        dx = Q[num - 1].h - P[n].h;
        dy = Q[num - 1].v - P[n].v;
        if (ABS(dx) + ABS(dy) > PT_SEP_POST)
                n++;
        P[n] = Q[num - 1];
        return n + 1;
        }
/****************************************************************/
/* These five lines determine what the features actually are. */
#define Feature1(t) ('0' + ((t + 10 + 45 + 180) / 90) % 4)
#define Feature2(t) ('0' + ((t + 10 + 00 + 180) / 90) % 4)

#define Feature3(t) ('0' + ((t + 10 + 22 + 180) / 45) % 8)
#define Feature4(p) ('0' + (4*((p).h - theStroke->XminT) /theStroke->WidthT))
#define Feature5(p) ('0' + (4*((p).v - theStroke->YminT) /theStroke->HeightT))

/****************************************************************/
#define FOO(name,fxn,type,array,end)           \
        void name(char *s,StrokePtr theStroke) \
        {                                      \
        register short i,d,n = 0;              \
        register type *T = theStroke->array;   \
        s[0] = fxn(*T++);                      \
        i = theStroke->N - end;                \
        while (i-- > 0) {                      \
                d = fxn(*T++);                 \
                if (s[n] != d)                 \
                        s[++n] = d;            \
                }                              \
        s[++n] = 0;                            \
        }
FOO(Fxn1,Feature1,short,T,2)
FOO(Fxn2,Feature2,short,T,2)
FOO(Fxn3,Feature3,short,T,2)
FOO(Fxn4,Feature4,Point,P,1)
FOO(Fxn5,Feature5,Point,P,1)
/****************************************************************/
void ConvertStringToLong(char *s,unsigned long *np,short bits) {
        unsigned long n = 0;
        short i,len = strlen(s);
        s[len] = s[len-1];

        if (len > 32/bits) len = 32/bits;
        for (i = 0; i <= len; i++)
                n = (n << bits) + s[len - i] - '0';
        *np = n;
        }


Copyright © 1992, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.