Channels ▼

Community Voices

Dr. Dobb's Bloggers

Semantic web not very semantic

August 03, 2008

Recently, I've had the chance to learn about the current status of Web 3.0 aka the "Semantic" web.  For those of you still working in Web 2.0, or heaven forbid, Web 1.0, the Semantic web is a vision of what the World Wide Web should be by Tim Berners-Lee, the creator of much of the ideas behind our modern day web technology.  I won't go into a deep expose on the Semantic Web - you can easily Google that up yourself.

 Instead, I thought I'd give some impressions of what I've learned so far:

RDF/S

RDF and its cousin RDF Schema is an attempt to build on XML and XML Schema (conceptually) to create a means by which knowledge (ontology's) is specified in the Semantic Web.  RDF, like XML, is verbose and at times very hard to read - odd for a supposedly human-readable format.  So bad in fact, other more human-readable forms like Notation (N3).  RDF is essentially a mechanical means to specify ontology "graphs" or relations (if you like database terminology).  Imagine a hierarchy of animals with more specific creatures farther down the tree.  The entire hierarchy with all of its rules, behaviors and assumptions forms the ontology of animals. 

The problem is that 1) RDF for non-trivial ontologies is bulky and nearly impossible to create without tools 2) there are no high-quality tools available.  There are a few research and marginally maintained open-source efforts.  But I was surprised that with the amount of time that's passed around the Semantic Web, there are no good tools available for working with it.

 SPARQL

SPARQL is essentially the SQL of the Semantic Web.  It's an attempt to make querying graphs similar to querying tables except that in SPARQL you are quering RDF triples, not tables & columns.  Instead of comparing values using a predesigned structure, you have to generate the RDF triples and query them.  Imagine a highly complex series of RDF graphs maybe with hundreds or thousands of entities.  You may end up with tens of thousands of triples or more.  Now imagine trying to construct a "JOIN" across the graph to get what you want. 

 For the uninitiated, in SPARQL the goal is to establish a series of triple "rules" that when applied to the graph cause certain triples to "fall out" as the results.  From those triples you can then extract whatever information you wish in addition to any information you extracted from the intermediate steps.  So imagine these triples:

fruit,color,red / fruit, type, apple / basket, owner, bill / basket, contains, fruit

To query these tripes and find the name of the owner who has an apple, I'd do this:

SELECT ?owner

WHERE {

?item type "apple" .

?container contains ?item.

?container owner ?owner.

}

Run that query and it pops out the name "bill".  Stare at the syntax for awhile.  Notice how the ?item (a variable) is used in the first position in triple #1 and the third position in triple #2.  This is essentially the "JOIN".  Now imagine staring at a graph with thousands of triples and writing a query like this.

Technologies

There is a bright spot in the Semantic Web - the Jena framework.  This nice, robust and functional toolkit for Java allows a developer to load and query a graph by using either SPARQL directly, or an API that provides for querying using a series of Jena objects. 

http://jena.sourceforge.net/

Another high note is the W3c RDF/S validator.  This handy little site will not only validate your RDF/S but also generate optional RDF triples from the RDF, and create a graph showing the relationships.

http://www.w3.org/RDF/Validator/

The Semantic Web is highly complex and a vast shift from Web 1.0 and 2.0.  The effort to create, maintain and update ontology's for domains is proving daunting and will likely be one of the roadblocks to pervasive use of the Semantic Web.  Having said that, a "micro-Semantic Web" that is applied within a company to corporate knowledge bases where the scope of the ontology's and the effort needed to maintain them is manageable would seem to be an obvious candidate.

 Keep your eye on this technology, but take any hype with a dose of caution, mixed well.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video