Source: AI Expert, May 1987
Before AI researchers tried to make computers understand natural language, it was thought to be a relatively straightforward matter. Children seemed to do it so effortlessly, and parsing was taught in school as an algorithmic task. However, after years of work, computers still cannot understand natural language as well as young children. In the process of discovering the difficulties, AI has greatly elucidated the nature of language understanding.
Idioms and metaphors present special difficulties. For example, although it is easy to get an expert system to understand some of the consequences of the relationship between water and ice, we would not expect it to infer a relationship between the phrases "walking on thin ice" and "walking on water." But such examples, while fun to ponder, are only the tip of the iceberg. Eliminating them from consideration does not significantly reduce the difficultly. In fact, the idiomatic nature of language is not restricted to occasional use of idioms; virtually all language is idiomatic.
The irregularities of natural languages certainly complicate matters, but the
crux of the difficulty is present even when using a regular synthetic human
language like Esperanto. Not only are the rules of natural language itself unknown—we
don't understand the complex cognitive process involved in using them.
Furthermore, it turns out that a huge amount of knowledge is used in understanding
even simple sentences. Knowledge of the subject matter of a sentence is clearly
required. The meaning of a sentence depends not only on the things it describes,
but also in both aspects of its causality; what caused it to be said and what
result is intended by saying it. In other words, the meaning of a sentence depends
not only on the meaning of a sentence itself, but on who says it and when, where,
how, why, and to whom it is said.
It is obvious that precise shades of meaning vary with context and that meanings of certain words are always relative. Comparative modifiers such as "light" and "heavy" belong to this category; we interpret them according to what they modify. We assume, for example, that a light computer is heavier than a heavy book.
However, knowing the modification is not always sufficient. Whether a crowd is considered light or heavy depends on where and why it has gathered. Such context sensitivity holds for most words, and generally the context must be very bread for effective understanding.
The cognitive process of understanding is itself not understood. First we must ask what it means to understand a sentence. The answer usually given is to make a model of its meaning. But this answer just generates another: What does meaning mean? Rather than delve into the meaning of meaning as philosophers have been doing for centuries, we approach this as 20th century computer scientist and seek a more practical answer.
We normally believe someone understands what we said if he or she responds appropriately. Thus, to understand a sentence, a computer needs to transform it into a representation that will yield an appropriate response. This, of course, leads to yet another question: What constitutes an appropriate response?
The appropriateness of a response depends on the situation. For example, suppose a woman tells a natural language interface to a train schedule database that she needs to take the first train to Nashville. A response consisting of the departure time and track of the next available train indicates that the system completely understood what she said. But if she tells it to boyfriend, who knows her mother is in the Nashville hospital, she would think he wasn’t at all understanding if he responded with railway information.
As another example, consider the sentence: "Do you know what time it is?" The response to this yes/no question should be based on it semantic equivalence to the imperative: "Please tell me what time it is." You may think an unamplified affirmative response would be perfectly appropriate—that is the question that is inappropriate—but the following examples illustrate the ludicrousness of always basing responses on literal interpretations.
It is technically correct to answer "Yes" to the question: "Is there any water in the refrigerator?" when the only water present is frozen into ice is in the cells of the celery. Questions of the form: "Do you want this or not?" could always be answered affirmatively by interpreting "or" as the logical inclusive disjunctive, for the choices given exhaust the possibilities. We certainly don't want computer systems to respond in these ways any more than we want people to.
Different representations of the same sentence are appropriate in different circumstances. In the preceding example, the train data base should use a very simple structure of facts, whereas the boyfriend must make use of nonfactual, extralinguetic knowledge of undetermined structure. The complexity of meaning representations required for the general cases is one of the chief difficulties of natural language understanding.
LEVELS OF LANGUAGE
Language processing takes place in several different levels, corresponding roughly to different units of language. Difficulties arise at each level as well as from their interaction.
The lexical level involves the kind of information found in dictionaries: The definitions of the word and its word classes. At the syntactic level, words are formed into phrases and sentences. Syntax is purely a matter of structural form. Next comes semantics, the name of which actually means "meaning" itself. The meaning here is that associated with the sentential structure, the juxtaposition of the meanings of the individual words. The next level is the discourse level. Its domain is intersentenial, concerning the way sentences fit into the context of a dialog text. The last level, pragmatics' encompasses everything else—not just a particular linguist context but the whole realm of human experience.
As an illustration of some of the distinctions between these levels, the following sentences are unacceptable on the basis of syntax, semantics, and pragmatics, respectively:
John water drink.
John drinks dirt.
John drinks gasoline.
Note that the combination of "drink" and "gasoline" is not unacceptable, as in "People do not drink gasoline" or the metaphorical "Cars drink gasoline."
It is traditional for linguists to study these levels separately and for computational linguists to implement them in natural language systems as separate components. Sequential processing is easier and more efficient but far less effective than an iterated approach.
Traditional grammars dealt primarily with syntax. The most popular kind of grammar
in computational linguistics is the context-free grammar. Since most structured
computer languages have context-free grammars, efficient context-free grammar
parsing algorithms have been developed from compiler design work.
Although ungrammatical sentences are unparsable, they are not necessarily unmeaningful.
In many ways, syntax is irrelevant to understanding. Communication is rarely
impeded by a lack of agreement in number or tense, for example, as in "The
person who done it—it's their fault."
However, the role of syntax can be crucial. There is absolutely no other way to distinguish "The man who knew him went left" from "The man who knew he went left." Of the following four sentences, the first two are syntactically similar but should be interpreted very differently by a natural language system, while the last two, which are quite different in form, should transform to exactly the same internal meaning representation:
Mother was baking.
The apple pie was baking.
Mother baked an apple pie.
An apple pie was baked by mother.
Context-free grammars do not account for such phenomena; transformational grammars do, but all attempts to parse them have resulted in combinational explosion. Other kinds of grammars, which relate to semantics more directly, will be discussed later.
Once a meaning representation scheme has been selected, there is still the problem of how to map the input sentences to it. The mapping procedure is especially complicated because a single sentence can have many meanings, and many different sentences can have the same meaning. The former phenomenon, which presents the greater difficulty, is known as ambiguity.
LEXICAL AMBIGUITY
A classical example of lexical ambiguity is the sentence: "Time flies like an arrow." Each of the first three words could be the main verb of the sentence, and "time" could be a noun or adjective, "flies" could be a noun , and "like" could be a preposition.
Thus the sentence could have various interpretations other than the proverbial
one. It could be a command to an experimenter to perform temporal measurements
on flies the same way they are done on arrows. Or it could be a declaration
that a certain species of fly has affection for a certain arrow.
Some less artificial examples are: "I saw that gas can explode" (either
explosive incident was witnessed or an explosive property was demonstrated),
"They should have scheduled meetings" and "Visiting relatives
can be annoying."
Those examples all involve word class ambiguity. A simpler type of lexical ambiguity
involves multiple meanings of a word within the same class. "The pitcher
fell and broke" is syntactically incomplete or semantically invalid if
a system happened to select the baseball related definition of "pitcher."
Since so many words have multiple definitions, it is important for a system
to have some criteria for distinguishing the appropriate one at an early stage
of analysis. One way to accomplish this is by supplementing the dictionary definitions
with semantic markers—general semantic properties (such as animate, abstract,
location, mobile)whose usage is guided by contextual clues.
Suppose the two entries for "pitcher" were so marked, one with the containment property for liquids and the other a s baseball-related and human. Then given the sentence: The water is in the pitcher,” a system would select the former definition due to the presence of the preposition "in" and it might even be able to understand the eclipsed "John drank a pitcher." Of course, we could still confuse it with "John drank a tall pitcher while watching the baseball game." Unfortunately, relying on semantic markers to perform lexical disambiguation in general requires a quantity and a specificity that makes them as unwieldy as the word definitions themselves.
Resolving lexical ambiguity often requires a context larger than the sentence.
In reading the isolated sentence, "She approached the bank," there
is no way to know whether the bank is a lake ridge or a financial building.
However, previous sentences might contain helpful information, such as that
she was wearing a ski mask or she was a boat.
SYNTACTIC AMBIGUITY
Syntactic ambiguity is structural ambiguity. Although different syntactic structures were associated with the different interpretations in the preceding examples of ambiguity, they arose at the lexical level. The sentence forms were not themselves ambiguous.
A very common type of structural ambiguity is due to modifier placement, as in the following innocuous-looking example: "John saw the woman in the park with a telescope." Each of the two prepositional phrases, "in the park" and "with the telescope," could be modifying either "saw" or woman," and the second one could also be modifying the first's noun, "park."
From the various ways of combining these possibilities, five synaptic structures result. The interpretation corresponding to structure IV, for example, is that John is in the park and the telescope is the park but John is seeing the woman, who may or may not be in the park, with his naked eye.
Part of the multiple ambiguity involved is due to the choice of the word "telescope," for it s both an a object used for seeing and one that is found both in parks and with people. If we replace "telescope" with "fountain," only structures II and IV make sense; substituting "cat" for "telescope" rules out at least I and III, whereas substituting "baby" definitely rules out all but V.
Since the number of possible structures increases exponentially with the number of modifier phrases, it becomes necessary to eliminate the unlikely ones at an early stage of processing. In the absence of contrary information, the tendency is to try to attach the modifier to the closest constituent first. The following joke, in which the modifier is an adverb, plays on this tendency.
John: I want to go to bed with Marilyn Monroe again tonight.
Jane: Again?
John: Yes, I've had this desire before.
Nominal compounds, in which nouns may be used as adjectives, entail a similar type of modifier ambiguity. Our knowledge that electric pencils don't need sharpening helps us parse "electric pencil sharpener," but "dangerous animal trainer" and "metal shelf bracket" could each be interpreted either way. And without carpentry experience, there is no way to know whether a wood screw would screw wood.
A semantic analog of this problem affects the structure of the deeper meaning representation. For example, consider the difference between "knowledge engineer" and "blonde engineer"; "knowledge" modifies "engineer" and "engineer " modifies the implicit noun "person," whereas "blonde" modifies "person" directly.
One of the most difficult technical issues for natural language systems to deal with is conjunction scope. For example, in the phrase "old men and women," the women are also supposed to be old only if "old" is outside the scope of "and." Context-free grammars that deal with conjunctions in general require them to be binary operators, so a nested pair conjoins has two possible structures [see Figures 2(a) and 2(b)].
If the conjunctions are the same, these could be semantically equivalent, as in "I'll have cake and pie and cookies." But consider the less greedy:
I'll have bread or toast an tea.
I'll have toast or tea and sugar.
Referring to the preceding diagram, the first sentence clearly has the structure of the left diagram and the second that of the right diagram.
Systems also need a way to account for the inappropriateness of conjoining the sentences "Mother was baking" and "The apple pie was baking" to produce "Mother and the apple pie were baking."
Negation and quantifier scope engender further confusion. These phenomena are particularly problematic in expert systems, which use such logical terms a lot. The command "List the trains that service every city" could be interpreted to yield a list for each city or a single list consisting of their intersection. On the other hand, when a parent tells a child "Everyone does not do that," the parent could be taking advantage of ambiguity to seem to be making a stronger statement.
Subtler situations occur with vaguer quantifiers. Compare "Not many people voted for him" to "Many people didn't vote for him." It is very hard to distinguish cases semantically. In neither case is the election outcome apparent, but that's our linguistic system.
GARDEN PATHS
The sentence "The horse raced past the barn door fell down" is not ambiguous, but processing it certainly causes structural ambiguity problems. Its ambiguity is said to be local rather than global since it can be resolved by the end of the sentence.
Such sentences are called garden path sentences, possibly because they lead one down the garden path in a quest for understanding. Here are some more examples:
The artist painted on eh wall was black.
John told the man the dog bit Jane was hungry.
The horse raced down the garden path meandered.
Using the context-free grammar formalism, the underlying model for this phenomenon is a grammar segment of the form:
A—xy
B—yz
C—xB
Given the input sentence xyz, the xy part is first interpreted as an A and then the z is left dangling since Az is unparsable. The processor has to back up and reanalyze the xy, grouping the y with the z of the x.
Computers can easily be programmed to handle this, either to an extent that is arbitrarily limited by using look-ahead techniques or to a virtually unlimited extent by backtracking. But people have trouble with garden path sentences because they do not typically do backtracking an can handle only very limited amount of parallel processing to look-ahead. The limit is commonly believed to be three. This means a person can keep three syntactic constitutes hovering unanalyzed in his or her head and can parse three levels of embedded phrases.
When computers do recursion and backtracking, as all PROLOG systems do automatically, their natural limit is the stack size, which is huge. Thus such systems would accept the sentence "The child the rat the cat the dog chased ate it was hungry" as not only grammatical but comprehensible, whereas people certainly wouldn't. On the other hand, semantic restrictions enable slightly improved comprehension of the syntactically identical sentence: "The house the fire the water the neighbor supplied doused burned was insured."
Less extreme cases of local ambiguity occur with verbs like "have," which are sometimes auxiliary verbs and sometimes main verbs. After the first three words of each of the following sentences, one cannot tell whether it is a command or a question.
Have the people do it!
have the people done it!
If the last words were omitted from the following sentences, they would still be complete sentences: reaching the last words causes the preceding phrase to be reanalyzed as reduced relative clauses.
Is the book on the shelf red?
Is the number of people over 40 odd?
Natural language is so fraught with ambiguity that even when we try to eliminate it, we cannot do so. Generations of lawyers have thrived on reinterpreting a document intended to be completely unambiguous—the constitution.
DISCOURSE ANALYSIS
Just as the rest of a sentence can resolve lexical or local syntactic ambiguity, the rest of a discourse can resolve ambiguities that are global on the sentence level. At the discourse level, two particular linguistic connection phenomena are also handle: ellipsis and anaphora. Ellipsis is the omission of a word or words from a sentence, rendering it syntactically, but not semantically, incomplete. Not all cases require context. For example, "Stop that" is always short for "You stop that." And sometimes the required context does not extend beyond the sentence, as in "John has five dollars and Jane nine."
On the other hand, some sentences are almost completely elicited and hence totally depend on context, such as "Why?" A typical example of ellipsis is a response to a question consisting of just a noun phrase being inherited from the question, as in the following dialogue:
John: Who just walked by?
Jane: A tall blonde man.
The implicit verb phrase for the isolated noun phrase may arise from a context at large rather than a previous statement, as in "The next train to Nashville," when said to someone is a railway information booth.
Anaphora is a matter of abbreviation rather than omission. The referent is generally a previous expression. The abbreviated form is usually a noun phrase, either a pronoun or a definite noun phrase, such as "that" in "Stop that," but it can also be an adjective or adverb, as in "such things" or "do so."
A natural language system needs reasoning capability to find the possible referents and then select on of them. This process is facilitated by keeping track of the current focus of the discourse. The focus is the entity with which the discourse is most concerned at any particular time. It can shift unpredictably and there can minor foci.
One effect of the syntactic distinction in the active/passive pair of sentences "Mother baked an apple pie" and "An apple pie was baked by Mother" is that in the first sentence Mother is more in focus than the pie, whereas in the second the opposite is true. Tracking methods vary with the type of discourse—narrative, directions, argument, or conversation.
As with modifier attachment, proximity is a major consideration in determining referents, but it certainly does not suffice. For example, in "Mother cleaned the house, baked a pie, sat in a chair, and ate it," the correct referent is the closest edible one. In the following dialogue, the first pronoun ("that") refers to the most recent possible referent ("one") refers to the previous referent ("the answer")
John: The answer is one
Jane: That is wrong— it is two.
As a more subtle example, consider:
I just found a kitten and I have a cat so I am going to give it away.
The knowledge that tells us seniority is being honored comes from living in
a society where pets are treated a certain way. It is not the kind of knowledge
that could be easily be encoded in semantic markers. Compare the last sentence
to "I just won anew car and I have an old car so I'm going to give it away."
Syntactic considerations alone sometimes eliminate possible referents. Although
the pie owner and eater may or may not be the same person in the first sentence
of the following pair, they cannot be in the second sentence:
John ate his pie.
He ate John's pie.
The next example shows that syntax might play no role whatsoever. The referent of "she" is unclear in the fist sentence and very clear, though different, in the following two:
Jane gave Joan the candy because she was nice.
Jane gave Joan the candy because she was hungry.
Jane gave Joan the candy because she wasn't hungry.
"They" and "it" have the same referent in the following example, despite the fact that they differ in number and hence are syntactically incomplete:
Mother picked an apple
They are good sources of pectin.
She will make a pie with it.
Thus even knowing precisely what the focus is may not pinpoint it. Although the apple is the only thing in focus, it could be as a type of fruit or as a specific piece of fruit. The difficulties of determining the referents of ellipsis and anaphora are obviously great. But if a natural language system cannot handle these phenomena, its language will seem stilted and unnatural.
PRAGMATICS
Often the referent of anaphora or ellipsis is something that was never previously stated but merely implied. In "The next train to Nashville" and "I just found a kitten and I have a cat so I am going to give it away," the referents could not be established from the discourse alone but required broader contexts. The extra knowledge used was of a pragmatic nature.
Extensive knowledge abut the subject matter may be necessary to resolve references. Basic concepts used include connections between parts of objects, actions, and events. Thus, in the following text, we infer that the definite noun phrase "the apples" refers to an ingredient of the pie mentioned in the previous sentence:
Mother is going to make a pie.
She is washing the apples now.
Establishing the referent in "I just found a kitten and I have a cat so I am going to give it away," on the other hand, involved knowledge that was conceptually more complicated and much more subjective.
Even systems that deal with simple objective knowledge domains should be equipped with extra knowledge about their domains. That way they can avoid situations like the following. An insurance data base query system that seemed to understand gender distinctions when asked about male policy holders was asked a question about male insurance agents. In an attempt to be helpful, it responded: "Insurance agents don't have sex-only customers do."
Real understanding goes beyond facts to ascertaining goals. Goal inferencing was applied in interpreting "The next train to Nashville," and its application is attempted in the following situation. A person who attempted to phone a theatre but reached a taxi company instead did not understand the initial greeting and inquired, "Metropolitan Theatre?" The response was "Which one?", indicating that the inquiry was interpreted as a request for a ride to the theatre, for that was the only way it made sense to the hearer.
The general nature of a response depends on the statement's underlying form, which is related to but not necessarily the same as its superficial mood. In "Do you know what time it is?" we saw that an imperative can masquerade as an interrogative. Conversely, declarative statements sometimes should be interpreted as commands or questions, for example, "I forgot how to tie this" or "I thought you were going to have left but now." The conditional interrogative can be misleading. "Would you pass the pie?" is a request, whereas "Would you like some pie?" is an offer.
Modern approaches to natural language processing have emphasized semantics and pragmatics at the expense of syntax. First the concept of syntactic case was broadened to encompass semantics. Case grammars capture the distinction between the syntactically identical "Mother made the pie with a new apple" and "Mother made the pie with a new recipe" by assigning the instrumental case to "recipe" and the material case to "apple." They also explain the puzzle of "Mother and the apple pie were baking"; its ungrammatically is due to the conjoining of two different semantic cases.
Conceptual dependency theory practically eliminated syntactic considerations and used a small set of semantic primitives that describe relationships to represent meanings. It led to a trend of incorporating world knowledge into increasingly complex data structures based on frames. A frame is a cluster of properties associated with an object of an event.
When generalized to a sequence of events or an involved situation, frames are known as scripts. Scripts for common occurrences get filled in with the standard details unless given contrary information. Thus a restaurant script would have a default recording of this typical chain of events: being seated, getting a menu, ordering, being served, eating, getting a bill, and paying.
If a system is told that John went to Friendly's and ordered a hamburger and then asked, "What did John eat?", it would demonstrate the inference that he had eaten the hamburger he'd ordered. But if told that John went to Friendly's and ordered a hamburger then left, it would say he hadn't eaten and may also be able to answer the question "Why was John arrested?" provided it had other scripts that relate arrests to money, Gauging the significance of an omission to determine whether it should be filled in requires both domain knowledge and language knowledge.
The frame devices effectively endow the computer system with a background of human experiences, providing it with default contexts for resolving ambiguity and referents as well as encoding expectations. However, they do not capture interaction generalizations. For example, completely separate scripts are needed for different types of purchasing situations.
Since meaning does not just depend on a shared knowledge base of objective descriptions of the world but also on subjective aspects of the response, such as belief systems and current cognitive processing, a natural language system also needs a model of the user. User modeling is harder than representing any quantity of world knowledge because it's a matter of representing mental processes that aren't understood. Ultimately a dynamic user model, capable of readjusting its expectations, is needed to model interpersonal aspects of communication.
It is not clear that user models are respectable and, if they are, the representations still may not model human understanding. Even the necessary objective knowledge may not be representable by a formal system, let alone one that can be computerized. Representing language by pieces of formal structures is akin to representing images by dots, and it's well known how difficult it is to recognize an image from a close-up view of the visual patterns. Until cognitive processes are better understood, the approach to incorporating pragmatics into natural language systems must be pragmatic itself.
BEYOND LOGIC
There is obviously not much point in relying of predicate logic as an inferencing mechanism since even truth assignment alone is context dependent for almost all sentences. Other kinds of logic, such as many-worlds approaches, attempt to circumvent this problem, but incorporating them into computer systems is extremely difficult.
And logic cannot capture certain nuances. The words "but" and "and"
are supposedly logically equivalent, but "but" conveys an element
of surprise, as in "Jane sis a blind but not dumb>" Technically,
"most" means a majority but its usage often seems to carry the connotation
of an overwhelming plurality.
Very like of the inferencing people do in interpreting language is logical.
Some of it is even illogical. For example, it is typical for people to infer
the converse of a statement. When someone says "If it doesn't rain tomorrow
I will finish the project," this is interpreted to imply that the speaker
won't finish it if it does rain. There is also the implicit assumption that
no other relevant conditions will change. That is, in eh example, the will not
get finished if it snows or if the speaker gets sick.
People also tend to assume that statements they are processing are not vacuous.
The hearer of "I have never met a linguist I didn't like" assumes
that the speaker has met some linguists. Another assumption people are continually
making is that connected thoughts. This was used in the deduction of "Mother
is going to bake a pie. She is washing the apples now" in addition to culinary
knowledge. On the other hand, when a teacher hears "I can't take the final
exam tomorrow —my great aunt died," the teacher may know he or she should
not assume the events are connected, but rather that it is likely the death
is unrelated to the student's unpreparedness and preceded it by years.
Some of these assumptions are the domain of conversational implicature, so named to clearly distinguish it form logical implication. Rules if conversation dictate that people tell the whole truth and nothing but the truth, and did so reliantly and clearly (except lawyers, of course). SO, for example, in:
John: Who just walked by?
Jane: A tall blonde man.
Jane would infer that the man John mentioned is not someone they both knew by name.
The ability to correct misconceptions is a desirable feature in a natural language
system, for a person feels gratifyingly understood, when he or she says something
wrong and the computer responds to what was meant rather what was said. Incorrect
input to natural language interfaces can arise from misconceptions about either
the data base content (extensional) or its structure (intensional). The previous
insurance data base example illustrated an intension misconception.
An example of an extensional on is "List the employees who did not fill
out W2 forms" addressed to a company personnel data base. Although an empty
list would be a correct response, a more helpful one would start with "Don't
you mean W4 forms?"
Design decisions regarding the trade-off between precision and flexibility are very difficult to make. It is undesirable for a natural language system to rigidly reject sentences people easily understand , even if they're unacceptable to grammarians. Bu the looser and more accepting we make our systems, the more they misinterpret.
One of the earliest and most popular natural language programs, ELIZA the Rogerian therapist, had no grammatical structure requirements, but it was never indented to have any deep understanding. It would make impressive connections such as responding to "My older sister hit me" with Tell me more about your family," but "My friend's sister hit me" would elicit the same response.
Various ad hoc techniques an gimmicks, such as prefabricated response, keyword matching, and parroting, can give the semblance of understanding natural language. Such simulated understanding systems have perhaps stimulated unrealistic expectations.
The issues involved in understanding natural language are as complex as the situations that natural language can describe. The requisite knowledge is vast, including knowledge is word forms and meanings, sentential structure, conversational conventions, and the world at large.
Natural language processing spans virtually every field of AI—knowledge representation, machine learning, perception, reasoning—and the ability to use language is required for all kinds of intelligent behavior. Thus, solving the problems in natural language understanding is tantamount to solving al the problems in AI. And if that weren't so hard, you probably wouldn't be reading this magazine.
Skona Brittain has a degree in computational linguistics an is co-owner of Micrcomputer Systems Consultants in Santa Barbara, Calif.
REFERENCES
Rich, Elaine. Artificial Intelligence. McGraw-Hill, 1983.
Winograd, Terry. Language as a Cognitive Process, Vol. 1. Addison Wesley, 1983.
Winograd, Terry. "What Does it Mean to Understand Language?" Cognitive Science, Vol. 4, 1980.