The semantic Web refers to a movement to make the meaning of online documents and data more accessible to computers. "Computers are very dumb unless you tell them," said Sheng-Chuan Wu, VP of corporate development at Franz. "That's where semantic technology comes into play."
Semantic technology helps computers understand data better. For businesses and other large organizations, this becomes particularly useful when merging large data sets. For example, merging two personnel databases when one defines only full-time employees and the other includes part-time or temporary workers can cause problems. Semantic technology can help resolve that.
It's also useful for search applications because semantic technology lets computers infer relationships among data elements that aren't explicitly defined. A keyword search, for instance, generally returns only those documents that contain the queried keyword. A semantic search would return documents related to the specific meaning of the search term (i.e., military tanks but not water tanks), as well as those related to synonyms (i.e., armored vehicle).
Generally considered a subset of the next-generation Web technologies referred to as Web 3.0, the semantic Web doesn't really exist yet, largely because the available tools aren't up to the task. "We had to build the tools to make all this semantic Web stuff fly," said Ralph Hodgson, co-founder and executive partner at TopQuadrant. "The public domain tools [Protege and Swoop, for example] aren't the tools that enterprises are going to want to use."
That's not to say large organizations aren't already making use of semantic applications. It's just that they don't scale well using standard databases and development environments.
"You can program your way out of any corner," said Hodgson. "It just takes a lot of effort."
In the defense intelligence community, where semantic technology is useful for identifying connections between people for anti-terrorism applications, they tried to buy their way out of a corner.
Wu tells of a government agency that put all its terrorist information into a triple-store Resource Description Framework (RDF) system built on a standard database. The system required 22 terabytes of random access memory to handle the complex data processing and it was still unstable. When it crashed, he said, it took a week to boot up.
"That is just not a practical solution," said Wu. "You need a persistent triples database."
There already are a variety of semantic Web specifications, protocols, and languages including RDF, Web Ontology Language (OWL), and SPARQL, as well as related technologies like XML. This alphabet jumble gives developers the ability to organize data in a semantic framework. What the TopQuadrant/Franz combination adds to the mix is an Eclipse-based, graphic development environment for semantic Web applications and a database designed to scale with massive amounts of RDF data. And that can be important if you want to boot up in less than a week.
GlaxoSmithKline is testing AllegroGraph because of the advantages semantic technology can theoretically provide: a more flexible IT infrastructure and increased productivity through automation.
The drug development pipeline -- traditionally a 10- to 15-year process from research through clinical trials to market -- is changing, said Robin McEntire, director of knowledge-based systems at GSK. "When those changes take place, they're changes at the decision points" of a drug's development cycle, he said. "In order to support that, you need to make changes to the IT infrastructure. Right now, those are extremely difficult to make."
"We have a lot of systems within GSK, a lot of software systems," McEntire explained. "Like every large company, there's not always a coordinated effort to make sure that they can all integrate and migrate appropriately as things change in IT."
So GlaxoSmithKline is experimenting with what amounts to an abstraction layer of semantic data. The wet lab work in which most pharmaceutical companies engage produces too much data, said McEntire. "So we want to aggregate it and present it at a higher level," he said. "Having semantics is a great help for us in that."
The goal is to apply computer-based reasoning to evaluate and filter massive amounts of experimental data. "We think that low-level reasoning is a good place for us to start, where tasks that our scientists do, that aren't really rocket science but are time-intensive tasks, can be automated with this technology," said McEntire.
"Reasoning is going to become more important in the next five years," said McEntire. "We thinking reasoning is where text mining was six or seven years ago."
The Eastman Kodak Company, meanwhile, is developing semantic technology using AllegroGraph to help its customers manage their increasingly unmanageable collections of digital images.
"You have this ever-growing body of digital assets of all sorts of types that consumers are accumulating," said Mark D. Wood, a Kodak Research Laboratories scientist. "You've got this massive collection of unstructured facts. A product such as AllegroGraph is really designed for helping to sort a huge collection of unstructured facts and making use of it."
"What's interesting here is being able to infer meaning from visual data," explained Bruce Graham, the director of communications in the office of the chief technology officer for Kodak.
Kodak has been working on this for a while now. At the Consumer Electronics Show in January 2006, Kodak chairman and CEO Antonio M. Perez described how his company planned to make use of semantic technology. "Semantic understanding will remove the obstacles and provide the tools that have stood between consumers and their goal of telling their personal stories with the maximum impact," he said in a speech. "With this technology, the pictures begin to recognize each other -- so, without human instruction, a picture will use its metadata to find another picture with related metadata, so that, all the pictures keep assembling in new groups, depending on how they relate to each other."
That future remains a long way off. But the tools to build tomorrow's semantic Web are here today.