Semantic web not very semantic
Recently, I've had the chance to learn about the current status of Web 3.0 aka the "Semantic" web. For those of you still working in Web 2.0, or heaven forbid, Web 1.0, the Semantic web is a vision of what the World Wide Web should be by Tim Berners-Lee, the creator of much of the ideas behind our modern day web technology. I won't go into a deep expose on the Semantic Web - you can easily Google that up yourself.
Instead, I thought I'd give some impressions of what I've learned so far:
RDF and its cousin RDF Schema is an attempt to build on XML and XML Schema (conceptually) to create a means by which knowledge (ontology's) is specified in the Semantic Web. RDF, like XML, is verbose and at times very hard to read - odd for a supposedly human-readable format. So bad in fact, other more human-readable forms like Notation (N3). RDF is essentially a mechanical means to specify ontology "graphs" or relations (if you like database terminology). Imagine a hierarchy of animals with more specific creatures farther down the tree. The entire hierarchy with all of its rules, behaviors and assumptions forms the ontology of animals.
The problem is that 1) RDF for non-trivial ontologies is bulky and nearly impossible to create without tools 2) there are no high-quality tools available. There are a few research and marginally maintained open-source efforts. But I was surprised that with the amount of time that's passed around the Semantic Web, there are no good tools available for working with it.
SPARQL is essentially the SQL of the Semantic Web. It's an attempt to make querying graphs similar to querying tables except that in SPARQL you are quering RDF triples, not tables & columns. Instead of comparing values using a predesigned structure, you have to generate the RDF triples and query them. Imagine a highly complex series of RDF graphs maybe with hundreds or thousands of entities. You may end up with tens of thousands of triples or more. Now imagine trying to construct a "JOIN" across the graph to get what you want.
For the uninitiated, in SPARQL the goal is to establish a series of triple "rules" that when applied to the graph cause certain triples to "fall out" as the results. From those triples you can then extract whatever information you wish in addition to any information you extracted from the intermediate steps. So imagine these triples:
fruit,color,red / fruit, type, apple / basket, owner, bill / basket, contains, fruit
To query these tripes and find the name of the owner who has an apple, I'd do this:
?item type "apple" .
?container contains ?item.
?container owner ?owner.
Run that query and it pops out the name "bill". Stare at the syntax for awhile. Notice how the ?item (a variable) is used in the first position in triple #1 and the third position in triple #2. This is essentially the "JOIN". Now imagine staring at a graph with thousands of triples and writing a query like this.
There is a bright spot in the Semantic Web - the Jena framework. This nice, robust and functional toolkit for Java allows a developer to load and query a graph by using either SPARQL directly, or an API that provides for querying using a series of Jena objects.
Another high note is the W3c RDF/S validator. This handy little site will not only validate your RDF/S but also generate optional RDF triples from the RDF, and create a graph showing the relationships.
The Semantic Web is highly complex and a vast shift from Web 1.0 and 2.0. The effort to create, maintain and update ontology's for domains is proving daunting and will likely be one of the roadblocks to pervasive use of the Semantic Web. Having said that, a "micro-Semantic Web" that is applied within a company to corporate knowledge bases where the scope of the ontology's and the effort needed to maintain them is manageable would seem to be an obvious candidate.
Keep your eye on this technology, but take any hype with a dose of caution, mixed well.