Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼

Web Development

Multilingual Search Engine

Researchers from the Validation and Business Applications Group (VAI) at the 's School of Computing have developed a multilingual search engine to query a contents repository written in Interlingua using questions formulated in any language. The search engine returns a precise answer in the language in which the question was formulated.

"Interlingua" is a language-independent contents representation. The United Nations' Universal Networking Language (UNL) is the only general-purpose Interlingua specified by standards, handbooks, and governing organizations. UNL was created to break Internet language barriers, and the VAI is the UNL support group for the Spanish language. The multilingual search engine is a question-answering system that aims to return precise answers to questions about facts formulated in the user's mother tongue

The novelty of this system is that the question can be formulated in English, French, Spanish, or any other language, and the system will return an answer formulated in this same language without any translation from source to target languages, because the information base that the system searches is written in UNL.

Supposing that the answer is implicit in the question, the system exploits the features of the UNL representation of the user's question to find the answer. The search engine works by deducing the answer from the question rather than "finding" the answer to the question.

How It Works

The search engine is responsible for searching the text corpus written in UNL to find the answer as follows. First, it searches the text corpus for statements that could contain the answer. Second, it determines which of this set of statements contains the answer, and what the answer is. It then generates the answer in the same language that the question was formulated in.

In response to the question, "Why was Aubert awarded the Camere prize?," for example, the search engine searches the repository and locates the graph shown below. From this graph, it deduces the answer to the question, i.e. "For a new type of movable dam."

Promising Results
Researchers used the UNESCO biographical encyclopedia as an information base for the exercise concerning the French engineer Jean Aubert (1894-1984). This encyclopedia has 25 articles, which have been translated to UNL and contain 101 UNL expressions and 2534 universal words.

The results of this research -- 82% precise answers -- are promising. A total of 75 different questions (when, how, who) were formulated, to which the right answer was known beforehand. Other questions for which the repository contained no answer were formulated to examine system behavior in such cases. The results confirm the validity of this search engine for developing multilingual question answering systems.

The complete findings of this research, conducted by Jesus Cardenosa (VAI director), Carolina Gallardo and Miguel A. de la Villa, were presented at the 8th International Conference FQAS 2009 in Denmark, in October 2009, and are available in Springer's Lecture Notes in Artificial Intelligence 5822.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.