Question Categorization
A question category represents the type of answera person, place, animal, organization, time, currency, dimension, and so on. Fine-grained question classifiers may have 30 or more different categories. The accuracy of the classifier is critical because this is the first step in the processing pipeline (Figure 1). Any errors introduced in this step are propagated to the following steps and likely lead to the extraction of a wrong answer.
However, classifying questions is somewhat harder due to the limited amount of text. Pattern matching is a more accurate way to classify questions, instead of standard classification algorithms. One of the biggest clues to a question category involves the first noun following a question word. For example, in "What is the wingspan of a condor?", the noun "wingspan" following the question word "what" indicates that the answer should contain a dimension. A fine-grained categorizer uses a higher number of question categories. Some Q&A systems also use a hierarchy of question categories. The type of categorizationfine-grained or coarse-grainedis linked to the extraction of entities from the text. The likelihood of an answer is partially based on matching the question category with extracted entities; see Table 1.
Word/Phrase | Answer Entity |
Who/Whose | Person |
Who is | Organization, Person |
What/Which | Person, Organization, Company, Place |
When | Time |
How many | Number |
How much | Currency |
How far/How long | Dimension |