Hyperanimation

If you thought talking heads like Max Headroom were science fiction, guess again. In this article Elon talks about the teachnology behind HyperAnimator, a Macintosh product that lets you add synchronized animated heads and sound to produce a lifelike effect.

July 01, 1988
URL:http://www.drdobbs.com/web-development/hyperanimation/184407968

Figure 1:

JUL88: HYPERANIMATION

HYPERANIMATION

Elon Gasper

Elon Gasper is the president of Bright Star Technology Inc. He is a specialist in interactive graphics and real-time programming for small computers. Elon previously was software development manager for a UCLA audiovisual research department and has also taught in the California State University System. He may be reached at 14450 NE 29th St., Bellevue, WA 98007.

Within the next few years, databases will do more than simply provide information in the static, sequential form we've become accustomed to. Advances in both software and hardware technologies will enable developers to provide new ways of presenting information, methods that include images and sound, as well as the familiar alphanumeric data.

One approach envisioned by futurists is a "talking head" computer interface. Apple has recently suggested that in the twenty-first century interface such as this will be an important part of its "Knowledge Navigator, HyperCard, and other powerful software applications that integrate hypermedia, simulation and artificial intelligence.<fn1> (See accompanying text box.) A talking head is essentially the life-like image of a human face which interacts with the computer user. Alternately referred to as a synthetic actor or an anthropomorphic agent, it will be one of the basic means by which future PC users interact with their computers.

In this article, I will describe a way to create synthetic actors using a technology I call HyperAnimation that combines and coordinates images, sound, symbols, and data to present information in the form of a talking head on the computer screen. However, the HyperAnimation system is more than just the front-end to a database; it also includes a database engine and a set of software tools that lets developers digitize images, bring those images onto the screen, and make the image say anything you want. What I will do in this article is first to describe the components of the HyperAnimation development system, then discuss how those components work together.

Similarities to Hypertext

Early applications of computer-augmented text editing and storage were limited to the linear-thought organizational methods of paper-based systems. Hypertext frees the user of words from the sequential nature of their embodiment of words on paper. An analogous situation has developed for animation technologies. From celluloid strips, to videotape, and on to desktop presentation software, sequential methods have prevailed. HyperAnimation is the first general-purpose system for random access and display of images on a frame-by-frame basis, a system that is organized and synchronized with sounds. Like Hypertext, HyperAnimation technology enables its users to transcend the limitations of linearity and to create a whole new realm of possibilities.

Specifically, with HyperAnimation software the computer can intelligently coordinate image display with synthetic speech so that a representation of a human being, a cartoon character, or even a robot, can deliver lines never spoken before, complete with appropriate facial movement. This ability to rigorously combine randomly accessed images and speech fragments under control of an interactive computer program can be summed up in three words: "Read my lips!"

The Applications

Each generation of human interface is more intuitive for users because of a better fit to our innate human abilities. The first generation was simply switches and lights; the second, keyboards and character output; the third employed graphics metaphors and a pointing device. Voice synthesis and recognition is the fourth. With HyperAnimation capabilities, the addition of a synthetic actor's talking face further enhances the communications bandwidth, introducing the fifth generation human inter"face".

An opportunity exists for applying HyperAnimation anywhere that humans interact with computer technology. The countless possibilities include interactive entertainment, education and training, telecommunications, robotics, and, of course, new applications software.

The remainder of this article provides an overview of HyperAnimation technology: the paradigms used in implementations and the basic structure of the control software. Bright Star's Talking Tiles educational program (which uses an anthropomorphic agent as a simulated teacher) provides an example of HyperAnimation capabilities and potential, while the HyperAnimator is used to describe how to experiment with synthetic actors by using HyperCard.

New Tools for Actors

HyperAnimation technology is based on a descriptive authoring language called RAVEL (the Real-time/random-access Animation and Vivification Engine Language) and its associated run-time system RAVE. The RAVEL language contains statements to describe any system of symbols, sounds, and facial movements that a designer can use to weave the patterns of human speech and facial images into a talking face capable of reading a script stored in ASCII text form. This enables a designer to ravel the patterns of human communications. This capability decomposes an individual's facial images and sounds into constituent parts, and specifies how these visual and audio threads are to be dynamically rewoven into the image and sound of that person (or thing) saying (or doing) something else.

Synthetic actors can read from the same cheaply stored (and easily modified) text scripts, can use different voices, and can speak different languages. Interchangeable models of celebrities, politicians, cartoon characters, or even yourself can be plugged into applications programs as easily as fonts into a PostScript document, or peripherals into an SCSI port. This interchangeability enables quick prototyping of synthetic actor designs.

In addition to RAVE and RAVEL, we have created a number of tools. These tools were written to create the most general context possible for the development and utilization of synthetic actors. These tools serve primarily as anthropomorphic agents and simulated teachers, and especially for language learning.

For example, a device-driver version of RAVE (currently in field testing) can be used by any Macintosh application. The driver has been linked to Apple's HyperCard with an XCMD to open the use of synthetic actors to any stackware creator. An early version of this product, the HyperAnimator, was demonstrated in January at the MacWorld Expo show. This article later discusses how RAVE is invoked in this context from Hypertalk to summon synthetic actors. (An example script is given in Listing One, on page 66.)

We have also released some educational software that use HyperAnimation. Alphabet Blocks, for example, teaches basic phonics using a cartoon synthetic actor---a magical talking elf who talks to and interacts with children as they play with simulated objects depicted on the screen. In Talking Tiles, a digitized teacher shows how the pronunciation of words is related phonetically to spelling of the words. This application is described in more detail because it illustrates how the capabilities of RAVE can be maximized.

RAVEing to the Max

Talking Tiles is a general-purpose learning tool intended primarily for teaching language skills. In it the animated lip-synchronized synthetic actor functions as a "simulated" teacher who instructs the student by using simulated anagram tiles. When the student selects a tile, the simulated teacher pronounces the proper sound (or sounds) associated with that letter (or set of letters). The selected tile may then be dragged onto the playing field to begin a word or added to an existing word (Figure 1, page 22). The simulated teacher then pronounces the resulting combination. The unlimited vocabulary enables a student to experiment by constructing sequences of letters to make words and even to make sentences.

Letters in the tiles are highlighted at appropriate times to show which letters made which sounds and why. The pronunciation of each word proceeds in synchrony with a wave of highlighting that moves from left to right (in the case of English). This is coordinated with the speech sounds so that each letter in the word is highlighted during the audio presentation of the part of the combined sound during which its contribution is most prominent. This process is called orthophonetic animation.

The "simulated" teacher can sound out words by pronouncing component sounds of the word in sequence with unblended speech. The letter (or letter combinations) responsible for the sound are simultaneously highlighted. Letters influencing the sound made by a particular letter in a word may also be indicated by underlining. The visual emphasis shows the student why a letter made its characteristic sound. The synthetic actor may even explain or comment on the letters and sounds. These functions are under the control of the RAVEL language, so the functions are fully independent of the human language (or set of symbols) being used.

Figure 1: Talking Tiles, the talking digitized teacher has rigorously synchronized lip movements. The teacher shows how the pronunciation of words is related phonetically to their spellings as the students manipulate simulated anagram tiles to make words.

Figure 2: HyperAnimation technology offers a general method to synchronize images, sounds, and the symbols and associated with them. Any number of images can be coordinated with speech to create synthetic actors.

The talking head of the synthetic actor provides synchronized moving lips (as well as other head and body movements) to provide the auditory and visual clues present in human speech. The synthetic actor's functions include enhancing the recognition of its synthetic speech with synchronized lip movements and other gestures. The talking head also makes the learning game more attractive and emotionally appealing. The talking head encourages imitation by demonstrating the forming of sounds with the mouth. Furthermore, the synthetic actor serves as a master of ceremonies for the learning program by explaining and demonstrating the use of the program and interrupting periods of inactivity to wake up or encourage the user.

Many extensions of this expert system for learning have been contemplated and planned. They all rely on RAVE actors' inherent understanding of the reading/speaking universe of discourse. For purposes of this article, the system is discussed simply as an example of the sorts of revolutionary applications that RAVE enables with its anthropomorphic agents.

How the Process Works

Before RAVE software can create HyperAnimation, the RAVEL author must describe the basic components of each synthetic actor. Images, sounds, and symbols are the three most obvious elements (Figure 2, this page). RAVEL statements define and link them together in a web of relationships. Each synthetic actor definition integrating these components is called a "model."

For each model, the designer specifies the behavior patterns of the talking synthetic actor (its voice and associated images) and how they are to be coordinated. An arbitrary number of associated sounds, symbols, and drawn (or digitized) images may be involved through the use of its orthography and phonology encoding scheme designed to handle any human language. This enables the potential realism of the portrayal of the synthetic actor and uses these techniques to be essentially unlimited.

The design goal was to make HyperAnimation independent of the technologies it integrates. Differences in human language (as well as the details of the display and sound production methods) are hidden from the application. A change in any one of these characteristics is handled at the RAVEL level by the synthetic actor animation designer. Thus, HyperAnimation applications can migrate upward as constituent technologies progress.

Image

The RAVEL designer provides a database of source images from which the RAVE run-time system creates synthetic actors. Any number of images may be used. They may be drawn by an artist or digitized from live subjects, claymation-type sculptures (such as television advertising raisins), or videotape recordings. More images mean more realistic synthetic actors. Fewer images mean less storage used and faster development cycles because application prototyping can be done by using a subset of a larger production version model.

The minimum number of images that can crudely simulate lip-synch is the degenerate case of two: one with the mouth open, the other shut. Surprisingly, even this can suffice, if only for crude humorous presentations. Eight images corresponding to various distinctive speech articulations are sufficient to create an acceptably realistic synthetic actor. Using the HyperAnimation tools and such a model, a designer may spend but a few minutes to digitize any person, to put that representation up on the screen, and to have that representation talking.

More complex models use additional images to enhance the smoothness of the presentation, to show separate exact articulations for each phonetic component for teaching language skills (i.e., literacy, lipreading, remediation, and therapy), to depict emotion, and for humorous enactments where the synthetic actor changes as it speaks.

The RAVEL language and its associated tools enable artists and animators to produce, manage, modify, and manipulate databases that represent the sets of images which make up models. Each database can be structured to reflect hierarchical relationships among sets of images in order to provide for submodels. For instance, one set can depict the actor in profile, another in a face-on view. Transition images can then animate the turning process itself. RAVEL statements designate all the images and tell how they are to function with sounds and with each other. An in-between operator in RAVEL can specify images that are inserted to add smoothness in transitions. Provision has also been made for automatic generation of in betweens that use a motion-blur algorithm currently under development.<fn2>

Special RAVEL statements tell the compiler each model's image size, pixelation, and the name of the file in which the images are stored. At run time, a RAVE routine opens each file, reads it and prepares the data structures. In current implementations this is all done in RAM; later versions will use the submodel structuring to buffer sets of images as needed. This will be especially useful with the very large models (and thousands of images) stored on optical media.

RAVE is set up to be as independent of the animation means as possible. Given the capabilities of the installed base of Macintosh computers, it is appropriate only to store and display bit-mapped screen images. Provisions have been made in RAVE and RAVEL for parametric descriptions<fn3> to be used instead of bitmaps so that processors with sufficient speed can create the images themselves by using mathematic models of the anatomy of the face. Another possible animation means is robotic. A lip-synced robot interfaced to the Macintosh has been demonstrated. It uses variants of RAVEL statements that describe servomotor configuration and commands.

Sounds

RAVE actors can function with prerecorded digitized sounds. But in order to have unlimited vocabulary, they speak through the use of a speech synthesizer. Many companies have developed different speech synthesizers. Some use special hardware; others (like Macintalk on the Macintosh) take the form of software-only device-driver modules resident on a particular host microcomputer.

Apple's Head Talks About Talking Heads

Each speech synthesizer is usually associated with a facility to translate human language source text to the phonetic codes used to control it. That process will be discussed separately in the next section. For now, consider only the speech synthesizer proper, (i.e., the part that just pronounces the phonetic codes sent to it).

Though their internal methods vary greatly, each speech synthesis unit appears to the calling program as two parts: a way to produce relatively short phonetic segments of speech, as well as a way to concatenate and blend them together. This thumbnail description is a drastic oversimplification of a complex situation.<fn4><fn5>

There is no standard way to break up speech into segments, nor any agreed upon set of codes to denote the segments. Indeed, different ways (words, phonemes, syllables, and so forth) work better for different human languages and design constraints (such as memory and speed). So each synthesizer uses its own seamentation methodology with idiosyncratic encoding terminology. To make matters worse, the encodings often vary in length in an attempt to be mnemonic. For example, Macintalk uses strings of approximately 60 possible codes, each of which is comprised of one or two ASCII characters.

Especially in the case of microcomputer software, each developer of a speech synthesizer has tended to regard its product as the "be-all" and "end-all" of speech synthesis. For instance, the system-resident speech synthesis software module on the Amiga is referred to as the Narrator Device. The RAVE/RAVEL system takes a broader perspective by characterizing each speech synthesizer as an instance of a narrator device. Bright Star wanted applications that use RAVE synthetic actors always to be able to take advantage of the state of the art in speech generation without recoding at the application level. Thus, a Narrator Device Integrator (NDI) was built into the RAVE system. The characteristics of a particular speech synthesizer or system of digitized sounds are then described by using special RAVEL statements that program the NDI functions. This also provides for using different speech generation products as different voices for separate actors appearing together in an application.

Figure 3: Talking Tiles with a Greek language tile set loaded.

The calling program never needs to know which speech synthesizer is being used. This information is hidden from it by RAVE, which assigns its own unique fixed-size encodings (called "phocodes") to speech synthesizer segments. RAVE and its calling program pass phocode strings back and forth to communicate. This enables the application to work with different synthesizers and human languages simply by specifying a different model.

Symbols

RAVE synthetic actors are designed to be able to read and speak any human language. The nature of the symbols, sounds, and their relationships are specified in RAVEL in order to be transparent to the calling program. The design goal is to enable any organized set of symbols of visual symbols associated with sounds to be programmable synchronized with facial animation and speech (or other sounds). Letters, words, sign language, mathematics, or scientific symbols are examples.

In order to provide for human language independence, text-to-phonetic translation functionality is built into RAVE. This is accomplished by specifying a set of rules (productions) to govern each particular translation. For many human languages, these rules are provided in a form similar to those of Elovitz et al.<fn6> Their methods are extended by several additions (including statements that define categories of characters) to achieve generality across languages. Elovitz and those who followed her work hardcoded these categories into their text-to-phonetics translators.

The RAVE text-to-phonetics translator also keeps track of the translation process and handles special rules statements needed to create orthophonetic animation (like that described in Talking Tiles, where the letters glow at the appropriate times). Needed information is generated and stored in an Orthophonetic Correspondence Record (OCREC) that tells which orthographic characters were associated with which phocodes, and what the orthographic context was examined to determine that pronunciation.

The number of rules needed depends on the complexity of the human language.<fn7> A language like Russian that is relatively consistent phonetically basically requires dozens of rules. Spanish and Greek are rather easy, too (see Figure 3 , this page). But rules for incorrigibly inconsistent languages like English run to hundreds of statements, plus individual exceptions. Each exception is defined by its own rule statement and optional custom orthophonetic rules to go with it.

Often difficult choices must be made about how to animate these quirky exceptions. Take the word "one." The RAVEL coder could decide to write orthophonetic rules to state that all three letters are to be displayed for any part of, and for the whole word, with no other effects generated. This would reflect a decision to indicate to the viewer that the word "one" cannot be decomposed into sounds that match the letters in any logical manner. Another possibility would be to indicate that the "o" of "one" made the sounds "w" and "uh," the "n" made the sound "nn," and the "e" was silent.

RAVEL source-code orthophonetic rules empower the designer to make these decisions and have them be transferable to other host machines rather than having them built in with low-level programming. These rules enable the application designer to create synchronized orthophonetic animation at any level, from having words light up as they're read, to having individual letters light up as a word is sounded out. You could even have an automatic Mitch Miller follow-the-bouncing ball.

The orthophonetic rules include various effects parameters that are set up to be as general as possible. They may specify the offset within the source string at which the animation is to occur. This enables animation of symbol-set combination modes in which characters cause sounds to occur in an order different from that in which the characters are arranged (Pig Latin, for instance). Other possibilities may involve overlapping elements of symbols (Oriental ideographic characters); denoting particular modes of effects; or specifying synthetic actor commentary to be associated with that rule ("this vowel is long because of that silent e"). Unfortunately, a detailed description of these considerations cannot be presented in this article.

Coordinating the Action

At run-time, RAVE routines rigorously coordinate the presentation of the images, sounds, and symbols to create the illusion of life. Each time the synthetic actor is called upon to speak, RAVE software executes a two-phase process. Phase one prepares optimized intermediate data in a form that enables maximum execution speed during the real-time phase two. As much of the processing as possible is completed prior to the commencement of actual speech and animation. This is accomplished by generating data structures of pointers and lists (called "scripts") for the real-time processes.

Figure 4: RAVE/RAVEL Data structures overview. Each synthetic actor has an entry in the model table. The actor's model record contains pointers to information structures that define its characteristics.

When direct to pronounce a phocoded string or word, the RAVE NDI translates it and generates an appropriate audio script for the narrator device that produces that synthetic actor's voice. RAVE routines extract necessary characteristics of the synthetic actor from the RAVEL program and from other sources to create this and the graphic animation scripts.

The animation processors that run these scripts in real time may be invoked parametrically, or to speed up the real-time phase, the RAVE system may actually generate these scripts itself by using internal compilation during phase one. When scripts are complete for the voice and animation of the synthetic actor, the RAVE real-time coordinator takes over. The coordinator cues and handles events and timing interrupts.

Ideally, the timing is coordinated with feedback from the narrator device driver. For instance, the device driver may tell which phoneme of the audio script was being pronounced at a particular time. For narrator device drivers (such as Macintalk) that do not provide such feedback, timing parameters may be associated with each image-animation frame. They may also be used in the case where feedback is available only at a higher level of granularity than is required for the animation sequence. For instance, many English speech synthesizers have only a single phonetic segment code signal for certain diphthongs. Yet, the sound is best displayed by two articulatory positions.

The behavior of the synthetic actor not associated with actual speech (for instance, when one turns toward the user or blinks) is controlled by a special behavior controller in RAVE and implemented with sequences of images just as the speech segments are. Each action or behavioral trait is given its own coded representation. To create a unique personality for each synthetic actor, a neural-net simulator is driven with RAVEL statements that define low-level behaviors. The intention is to create a software platform extensible to full modeling of human form and behavior.

RAVE Database Structures

At the heart of the database structures compiled from RAVEL source code is a dynamically allocated synthetic actor model table. The table structure provides for specification and simultaneous control of multiple actors, each with its own appearance, behavior, and language. Each record contains a set of pointers to variable-length tables that define particular information about that actor (Figure 4, this page).

The first pointer indicates a sequences table that defines images and synchronization characteristics for display of the synthetic actor. Next, a list of in-betweens defines intermediate images and the parameters that govern their insertions. In betweens can be specified in RAVEL, nested to any depth, and associated with specific timing intervals or dependent on the timing of context images.

The phocodes table defines the narrator device characteristics of the synthetic actor in terms of its speech segment and other codes. Each code has its own record in which a field specifies the number of bits in that particular code and a definition of it. This enables the calling application to manipulate phonetic encodings without knowing how the particular narrator device functions. An associated syncopations table describes the phonetic codes necessary to sound out words in unblended speech.

A table of attributes is also associated with the phocode table. It includes a flag that designates which phocodes are event phocodes (i.e., those for which explicit feedback is available from the narrator device). Another flag designates the orthophonetically significant phocodes (i.e., those that matter when creating and interpreting OCRECs). An intonation field indicates phocodes associated with stress or other intonation that may influence adjacent phocode timing. Another field basically indicates whether the phocode represents a vowel sound. This can be utilized in certain text-to-phonetic conversion methods for assignment of intonation. Knowing which phocodes are vowel sounds is necessary for syllabication to determine stress assignment and prosody in almost all languages.

Another pointer leads to a table of narrator-device characteristics stored in a format convenient to the audio script processor. This usually includes values describing speed, pitch, volume, pause codes, and narrator-device calling sequence idiosyncrasies. The final pointer in this simplified diagram contains the address of the text-to-speech translation table for the synthetic actor. It defines special codes and contains the rules stored in compressed concatenated fashion.

Other Features

Randomness in the behavior of synthetic actors is provided for through the use of RAVEL constructions that enable the designer to specify a set of images rather than a single image for display. Which image is actually used is determined randomly at run time. This enhances the illusion of life (living things are a bit unpredictable and just don't repeat motions exactly the same each time). The submodel architecture for defining particular RAVE actors can handle such problems as turning, tilting, and nodding of the head during speech (real people usually don't hold still as they talk). Extensions of RAVEL will eventually let such behaviors be programmatically coordinated with syntactic or even semantic components of elocution.

The RAVE system allows for anticipation or leading the audio by the motion. This can enhance the lip-sync perceptions apparently through suggestion of causality, or perhaps by simulating the fact that you see images slightly before you hear sounds in real life (because of the faster speed of light). Adjustments for intonation (for instance, making the mouth open wider on a stressed syllable) are also provided for in the design.

For the sake of simplicity, the use of RAVE and RAVEL software has been described as though each articulatory position were invariant with a phonetic output. In real life, lip positions vary in normal speech, depending on context. For instance, the usual position of the lips as they perch to deliver the second "b" sound in "beebee" is noticeably different from the position of the same "b" in "booboo" where they are slightly protruded in anticipation of the "oo" sound. Detailed descriptions of the RAVE mechanisms that handle these context-dependent coarticulatory variations (as well as other advanced HyperAnirnation features) are beyond the scope of this article.

Other possibilities besides a talking head can be created by using HyperAnimation software to coordinate sound and motion. Examples include a beating heart for teaching medical students, moving hand gestures for sign language, or fingering positions for a musical instrument. These possibilities rely on the fact that RAVE also works with digitized sounds, and not just synthetic speech. Another implication is that synthetic actor applications can be prototyped by using synthetic speech that is then replaced by custom-made prerecorded digitized speech in the production version (this is how Alphabet Blocks was developed).

We have used a number of studies, especially in the matter of sequential synchronization through recognition of digitized speech.<fn9>,<fn10>,<fn11> We are now building on these studies, particularly in order to enhance the efficiency of integrated HyperAnimation tools in working with the sort of prerecorded digitized sounds mentioned previously. The recent appearance of a superb system for capture and manipulation of sounds on the Macintosh (Farallon's MacRecorder<fn12>) makes this a fruitful area for further work.

The HyperAnimator

We are currently field-testing the HyperAnimator development system, which will enable developers to use a device-driver version of RAVE to direct synthetic actors from standard applications programs. Significant additional effort was expended to make its calling sequence compatible with Macintalk. That means no new bindings are needed. Not only can you prototype with Macintalk alone to save time, but existing programs that now use Macintalk can get a "facelift" by adding a synthetic actor.

Not only is the device driver currently running on the Mac, but also a HyperCard XCMD interface that enables the Hypertalk programmer to call synthetic actors into HyperCard stacks. Embedded escape sequences within the text parameter passed by a single Hypertalk verb control the position, appearance, and disappearance of synthetic actors. Using this method instead of multiple verbs enables it to be compatible with the Macintalk calling sequence, simplifies the Hypertalk binding, and avoids the problem of device-driver protocol variance on other machines. The design goal was to minimize porting difficulties by providing a simple interface for invoking the device driver that will be available in multiple operating environments.

As an example of how HyperAnimation technology can be used in HyperCard, HyperTalk source code (Listing One, page 66) was presented for part of an interactive stack demonstrated at January's Macworld Expo. This code features a synthetic actor reading a poem as it scrolls up a line at a time (Figure 5, page 34). Note that the user can rate the recital by pressing one of two buttons.

The HyperCard synthetic actor lives in its own little window, (in this case, one which has no frame). You can click anywhere on the face window and drag it to a different place on the screen, or control it from Hyperddtalk with the MOVE command. That and other control sequences are passed in the text parameter of the XCMD verb "RAVE" by bracketing them within the ~ and ~|escape-code sequences as shown.

Future Fun

As of this writing, RAVE software has been released only for the Macintosh. But the HyperAnimation tools are quite portable. RAVE was written in C using a shell that isolates the code from its operating environment. Early development work (before the Macintosh was released) was done on the IBM PC. Bright Star has also performed prototype work on the Amiga. From the beginning, the development team at Bright Star has sought maximum portability of the system by preparing for portability with macros and coding conventions set up after careful consideration of potential host environments. Bright Star intends to license the HyperAnimation technology (patents still pending) for special-purpose dedicated systems.

But first, HyperAnimation software will be placed in the hands of the Macintosh community. That machine will become the initial laboratory for mass experimentation with synthetic actors much in the same way it did for fonts, clickart, and desktop publishing. We expect to see a vanguard of applications and synthetic actor models (both public domain and proprietary) developed and used there in the first wave of a creative explosion caused by the availability of this newest and most intuitive generation of human interface.

HyperAnimation technology eventually will have a pervasive influence on diverse fields. In many ways, though, the most important use of synthetic actors will be in education. Bright Star has taken great care to create early educational applications of HyperAnimation that are innovative, fun, and educationally sound. But Alphabet Blocks, Talking Thes, and other educational products using HyperAnimation are barely the beginning. They offer only a taste of what software using this technology will someday be capable of doing. Particularly in literacy education, the language independence of the HyperAnimation platform will eventually ensure that as computational capabilities progress and prices decline, teaching tools of unprecedented power will help bring learning to all the peoples of the world. With simulated teachers to help, human enlightenment can become a less linear function of human resources. Face it---we need it.

Figure 5: Bright Star's HyperAnimator enables stackware designers to use lip-synced talking synthetic actors themselves.

Availability

For those interested, Bright Star Technology publishes The HyperAnimation News. For a free copy, write to: Bright Star Technology Inc., 14450 NE 29th St., Bellevue, WA 98007.

Bibliography

1. Sculley, John, et al. "Knowledge Navigator" and "HyperCard Future." Video presentations, Apple Computer, 1988.

2. Potmesil, M. and Chakravarty, I. "Modeling Motion Blur in Computer-generated Images." Proceedings, SIGGRAPH '83, Computer Graphics, 17(3) :389-399, 1983.

3. Parke, F.I. "Parametrized Models for Facial Animation. "IEEE Computer Graphics and Applications, 2(9) :61-68, 1982.

4. Witten, I. "Principles of Computer Speech." London: Academic Press, 1982.

5. Klatt, Dennis H. "Review of text to speech conversion for English." Journal of the Acoustical Society of America, 82(3) :737-793, 1987.

6. Elovitz, Honey Sue; Johnson R.; McHugh, A; and Shore, J.E. "Automatic Translation of English Text to Phonetics by Means of Letter to Sound Rules." United States Naval Research Laboratory Report 7948, December, 1976.

7. Sherwood, Bruce. "Fast Text-to-Speech Algorithms for Esperanto, Spanish, Italian, Russian and English." International Journal of Man-Machine Studies, 10:669-692, 1978.

8. Thomas, Frank and Johnston, Ollie, "Disney Animation: The Illusion of Life." Walt Disney Productions, 1981.

9. Bagley, J.D. and Gracer, F. "Method for Computer Animation of Lip Movements." IBM Technical Disclosure Bulletin, 14(10), 1972.

10. Bakis, R. "Speech Recognition System." IBM Technical Disclosure Bulletin, 13(4), 1970.

11. Magnenat-Thalmann, Nadia and Thallman, Daniel. "Computer Animation: Theory and Practice." Chapter 9: Human Modeling and Animation, Springer-Verlag, 1985.

12. MacRecorder Sound System, Farallon Computing Inc., Berkeley, CA, 1988.

HyperAnimation, HyperAnimator, RAVE, RAVEL, Alphabet Blocks, and Talking Tiles are trademarks of Bright Star Technology Inc.

Apple's Head Talks About Talking Heads

While HyperAnimation is a development tool that currently provides programmers with a talking-head front end to databases, the Knowledge Navigator is Apple computer's view of how a similar technology might be implemented in the future. In fact, Apple chairman John Sculley has even gone so far as to characterize the talking head technology as the basis for the Macintosh of the twenty-first century.

Apple introduced the concept earlier this year in a company-produced film entitled "The Knowledge Navigator." At the opening of the film, the PC (which looks surprisingly like descriptions of Man Kay's long talked about Dyna Book) has a screen with a Mac-like Menu Bar across the top and a "knowledge assistant" (i.e., talking head) in the corner. The user retrieves a map of South America from a database by simply telling the talking head to get it and, by pointing to various parts of the map, the user can search other databases, including those that contain relevant articles. When the user requests a specific article by a particular author, the talking head supplies the correct title as well as an abstract about it.

Sculley claims that the Knowledge Navigator project is more than just marketing hype and states that Apple's engineers are currently using computer simulation (presumably on Apple's Cray computer) to develop the project.<fn1> He adds that the project is based on hypermedia, simulation, and artificial intelligence: Three tools he believes will be increasingly important for PCs in the future.

This film is not a fantasy," he said. "It is based on work taking place at corporate and university research centers today."

--eds

[LISTING ONE]


HyperAnimation
by Elon Gasper

1988, Bright Star Technology,Inc.
===================================================================
--********************--
--*   Stack functions        *--
--********************--

on startup

  --******************************************************--
  --*  Open up the driver using ELON as our synthetic actor and move     *--
  --*  him to where we would like to see him on the screen.                   *--
  --******************************************************--
   RAVE "|~ACTOR ELON~|"
   RAVE "|~MOVE TOP 100 LEFT 150~|"


end startup

------------------------------------------------------------------

function scroll_line how_many_lines

  --******************************************************--
  --*   This function will quickly scroll the field "prose line" the           *--
  --*   number of lines passed in the parameter how_many_lines.           *--
  --******************************************************--
  repeat how_many_lines
    set the scroll of field "prose line"B
    to the scroll of field "prose line"B
    + textHeight of field "prose line"
  end repeat

end scroll_line

-------------------------------------------------------------------

function show_and_tell this_text

  --******************************************************--
  --*  This function shows what the actor is saying in the field             *--
  --*  face line (located underneath the actor's face) and then says it.   *--
  --******************************************************--
  put this_text into card field "faceline"
  RAVE card field "faceline"
  put empty into card field "faceline"

end show_and_tell



===================================================================
===================================================================


--********************--
--*   Button "A limerick"   *--
--********************--
on mouseUp

  --******************************************************--
  --*  When this button is pressed, the limerick field is reset to show  *--
  --*  the first line of the limerick on the screen.                                  *--
  --******************************************************--
  set the scroll of field "prose line" to 60

  --******************************************************--
  --*  Each line of the limerick is read from the field  "prose line",        *--
  --*  pronounced, and then scrolled upwards by calling the function      *--
  --*  scroll_line.                                                                                    *--
  --******************************************************--
  repeat with prose_count = 7 to 11
     RAVE line prose_count of card field "prose line"
     put scroll_line(1) into nothing
  end repeat

  --******************************************************--
  --*  Finally we pause a moment, before scrolling the limerick up off   *--
  --*  the screen.                                                                                     *--
  --******************************************************--
  wait for 3 seconds
  put scroll_line(7) into nothing

end mouseUp


===================================================================
===================================================================

--********************--
--*   Button "Bad Rating"   *--
--********************--

on mouseUp

  put show_and_tell ("Sure, let's hear you try it!") into nothing

  set the scroll of field "prose line" to 60
  put show_and_tell ("Go ahead, I am listening ..." ) into nothing

  put scroll_line(7) into nothing

  put show_and_tell ("What's Wrong?" ) into nothing
  put show_and_tell ("Cat got your tongue?" ) into nothing

end mouseUp

-------------------------------------------------------------------

--********************--
--*   Button "Good Rating" *--
--********************--

on mouseUp

  put show_and_tell ("Thank You!") into nothing
  put show_and_tell ("You are too kind!" ) into nothing

end mouseUp