Dr. Dobb's | Graphs Versus Objects

Graphs Versus Objects

Graph-based development can help in all areas of knowledge management, including Web 2.0 and beyond.

July 01, 2008
URL:http://www.drdobbs.com/architecture-and-design/graphs-versus-objects/architecture-and-design/graphs-versus-objects/208801968

John is a Division Scientist at BBN Technologies and Matt a Principal Systems Engineer at Progeny Systems. They can be contacted at [email protected] and [email protected], respectively.

Software objects are the de facto programming paradigm for engineering intelligence into modern computer systems. Objects' labyrinth of inheritance, polymorphism, and encapsulated data, intermeshed with ifs, whiles, and for loops, are the basis for flying airplanes, producing health diagnoses, and surfing the Web. Sometimes we escape this rigid paradigm and place the program intelligence elsewhere, such as databases and files. In most cases, knowledge solutions are a hybrid of approaches. Each method has its advantages and disadvantages. An alternative approach—graphs—offers a contrast to these traditional holders of programming intelligence. Graphs have improved significantly with the coming of the Semantic Web, where graphs are a key tenet. In this article, we introduce graphs through a comparison with objects. This approach illustrates some key advantages while stirring up a little controversy. Some would say we should start with comparing graphs to databases and other similar approaches, but this would constrain graphs to a more traditional role. Graphs, as you will see, can help in all areas of knowledge management, including Web 2.0 and beyond.

What do we mean by "programming intelligence" and what are its key attributes? Programming intelligence is not of the sentient, human kind. We mean the intelligence that represents sequences, relationships, algorithms, and the like. As the developer, you must constantly choose between the trade-offs of the various methods, such as programming steps themselves, databases, and files. Here are some key concepts to consider:

Expressiveness. Represent the degree of complexity captured by the chosen method. Complexity includes types of facts such as numbers and strings, relationships such as inheritance, containership, aggregation, peer, and constraints such as less than 10.
Integration. This drives the useful expansion of knowledge. Impedance mismatches caused by a different syntax and/or different semantics force you to create cumbersome translation routines or just not integrate at all.
Resource Use. Programming complex knowledge runs within hardware and hardware has real-world constraints. Resources consist of network bandwidth, processor cycles, memory locations, and/or disk locations. The approach must balance the knowledge requirements with the available resources.
Scalability. The approach must be able to expand and contract in several dimensions such as size, complexity, and performance.
Interrogation. The ability to ask the right question and receive the right answer significantly increases the value of the programming intelligence. This extends to answering ad-hoc questions not originally planned.
Flexibility. Escaping the speciousness of the waterfall process, modern development processes, especially complex, intelligence-based systems, must absorb change throughout many incremental, interactive deployments.
Integrity. The intelligence must maintain its consistency and correctness. This is especially important as an application moves data around. If there is minimal integrity, many programming steps are repeated to ensure that a float remains a float, a string remains a string, a specific integer never exceeds 100, and so on.

How does Web 2.0 impact these attributes? Web 2.0 represents three significant trends: scale, change, and integration. Web 2.0 has evolved the emphasis on systems—they must scale rapidly, quickly adapt to new possibilities, and easily integrate with others. Thus to be Web 2.0 enabled, you must carefully consider how your development choices incorporate these Web 2.0 trends. Additionally, the intelligence of your program becomes even more of a key asset—you no longer must do everything from moving data bits to a fancy GUI. If you can incorporate the trends into your solution, many Web 2.0 possibilities are already there for your integration. You can then focus on what you do best—your smarts.

Now let's examine some code examples that highlight the differences between objects and graphs. The complete code example is available online at SemWebCentral (www.SemWebCentral.org) and from DDJ; see www.ddj.com/code/. We selected a well-understood and completely original programming application—the digital marketplace. We also introduce changes as we build out the application. Two complete solutions are presented: object based and graph based. We start with the object based.

The code highlights five major aspects, such as how to:

Represent knowledge
Create instances of knowledge
Integrate knowledge
Interrogate knowledge
Change the representation

The better the representation, the better the semantics, but it must be able to work efficiently within the real-world constraints.

Objects Of My Affection

2008 seems a good time to experiment with a truly futuristic idea—an online store! First, we'll need sellers and buyers (using a Person class with a name field) with something to sell and purchase. Figure 1, the object-oriented UML class diagram for our example, depicts the knowledge representation.

Here we declare the representation in Java. Note the mixture of representation with actual processing. As the complexity grows, this becomes difficult to separate and improve. We use line numbers to denote that the following is an abridged code version:

3 public class PurchasableItem {
5 private float cost;
6 private String manufacturer, label;
8 public PurchasableItem (String label, float cost, String manufacturer) {
// Call appropriate setters
13 }
// Typical getters and setters follow ...
35 }

Figure 1: UML class diagram.

Add a Transaction class and we're ready to open our doors:

3 public class Transaction {
7 private Person buyer, seller;
8 private PurchasableItem containsItem;
9 private Date sellDate, shipDate;
10 private String label;
12 public Transaction(Person buyer, Person seller, PurchasableItem containsItem, Date sellDate, Date shipDate) {
13    super();
14    setBuyer(buyer);    // Call appropriate setters
                          // with remaining parameters ...
19  }
59 }

Let's watch our first sale happen as we create instances provided by the class definitions. (The full code is online.)


//Matt ...
Person seller = new Person("Matt Fisher");
// ... is selling his special toaster ...
PurchasableItem toaster = new PurchasableItem("High-wolf shiny toaster", (float)49.95, "Dualit");
pItems.add(toaster); 
// pItems is a HashSet of PurchasableItems
// John ...
Person buyer = new Person("John Hebeler");
// ... is buying the toaster very soon!
Calendar sellDateCalendar = Calendar.getInstance();
sellDateCalendar.set(2008, 3, 24, 10, 0, 0);
Date sellDate = sellDateCalendar.getTime(); 
// similar code follows for shipDate
Transaction tran = new Transaction(buyer, seller, toaster, sellDate, shipDate);
transactions.add(tran);

Keeping track of our marketplace requires some custom queries that interrogate our objects:


// Now, how many items has John bought?
     int count = 0;
     for (Transaction t : transactions) {
         if (t.getBuyer() == buyer) {
             count++;
         }
     }
   System.out.println("John has purchased " + count + " item(s)");
     // Now, what has John bought?
     count = 0;
     System.out.println("John has bought:");
     for (Transaction t : transactions) {
         if (t.getBuyer() == buyer) {
           System.out.println("   " t.getContainsItem().getLabel());
         }
     }

The resulting output is:


John has purchased 1 item(s)
John has bought:
   High-wolf shiny toaster

Our objects quickly created a basic solution. Additional queries require additional coding. This works to constrain the variety and power of the questions. As we continue to code, we realize we created a proprietary solution. Integration becomes difficult in two ways—more of the same instances and different types or classes of instances. The former requires custom integration code and is subject to our chosen storage method (for this example, it is merely in-memory arrays). The latter, different type of classes, creates an N2 problem as we write custom code to combine objects from a different class. (Imagine that a similar solution did not create a transaction class but rather a user purchase class.)

Cooking with Graphs

Now let's address the same problem with graphs. We implemented the graphs using the Web Ontology Language (OWL)—an expressive knowledge representation language based on the Resource Description Framework (RDF). Based on XML, RDF connects information using a "triple"—a subject, predicate, and object. This basic approach can represent all kinds of knowledge constructs such as the class structure ("Transaction hasBuyer Person"), instance data ("toaster hasCost $12"), and constraints ("PurchasableItem contains 1 Manufacturer"). Usually, the main data model, when expressed in OWL, is called an "ontology".

Figure 2 illustrates the graph. There is no drawing standard for graphs but the diagram adheres to common practices. The ovals represent classes (similar to OO classes), the thin named lines represent relationships, and the rectangles represent actual data. The numbers (1) and types (string) indicate restrictions placed on a relationship or type. Relationships can be represented in two ways—object properties that link two objects (classes) and datatype properties that link a data item with an object.

Figure 2: Graph model.

Here is an extract of the ontology in abbreviated RDF/XML format. We've used TopBraid Composer (www.topbraidcomposer.com) but Protege (protege.stanford.edu) or other editors would work just as well:

 
<?xml version="1.0"?>
<rdf:RDF
    xmlns="http://www.example.com/storeOnt#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
  xml:base="http://www.example.com/storeOnt">
  <owl:Ontology rdf:about="">
    <owl:versionInfo rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Graphs and Objects</owl:versionInfo>
  </owl:Ontology>
  <owl:Class rdf:ID="PurchasableItem">
    <rdfs:subClassOf>
      <owl:Restriction>
        <owl:onProperty>
          <owl:DatatypeProperty rdf:ID="hasCost"/>
        </owl:onProperty>
        <owl:allValuesFrom rdf:resource="http://www.w3.org/2001/XMLSchema#decimal"/>
      </owl:Restriction>
    </rdfs:subClassOf>
    ....
  </owl:Class>
</rdf:RDF>

Switching to graphs, let's apply our business acumen and create an online store where the knowledge representation is in the data instead of the code. There is no need to create the representation in the code itself. The Java code below relies on Jena (jena.sourceforge .net), the most popular open-source package for creating Semantic Web solutions currently available.

First, we'll load our graph model, which defines the classes and properties in Figure 2. To view the full ontology, go to graphsvsobjects .blogspot.com.

 
OntModelSpec s = new OntModelSpec(OntModelSpec.OWL_DL_MEM_RULE_INF);
ntModel m = ModelFactory.createOntologyModel(s);

Second, based on this graph, we can begin to add the items and related facts. Here we create instances; they could have been contained in the original ontology. This approach allows you to see both methods. Again, this is just a subset of the code. The full suite is available online:


// Matt ...
Resource seller = m.createResource(defaultNS + "mattFisher");
m.add(seller, RDF.type, m.getResource(defaultNS + "Person"));
m.add(seller, RDFS.label, m.createTypedLiteral("Matt Fisher", XSDDatatype.XSDstring));
// ... is selling his special toaster ...
Resource toaster = m.createResource(defaultNS + "shinyToaster");
m.add(toaster, RDFS.label, m.createTypedLiteral("High-wolf shiny toaster", XSDDatatype.XSDstring));
m.add(toaster, RDF.type, m.getResource(defaultNS + "PurchasableItem"));
Literal manufacturer = m.createTypedLiteral("Dualit",
XSDDatatype.XSDstring);
m.add(toaster, m.getProperty(defaultNS, "hasManufacturer"), manufacturer); // similar code follows for hasCost

// John ...
Resource john = m.createResource(defaultNS + "johnHebeler");
m.add(john, RDF.type, m.getResource(defaultNS + "Person"));
m.add(john, RDFS.label, m.createTypedLiteral("Matt Fisher", XSDDatatype.XSDstring));
// ... is buying the toaster ...
Resource sale = m.createResource(defaultNS + "toasterSale");
m.add(sale, RDF.type, m.getResource(defaultNS + "Transaction"));
m.add(sale,    m.getProperty(defaultNS, "containsItem"), toaster); // similar code follows for hasBuyer, hasSeller
// ... very soon!
Literal sellingDate =  m.createTypedLiteral("2008-03-24T10:00:00", XSDDatatype.XSDdateTime);
m.add(sale, m.getProperty(defaultNS, "hasSellDate"), sellingDate);

At this point, we're ready for real queries:


// Now, how many items has John bought? (
//   without using a special purpose query language)
ResIterator junk = m.listSubjectsWithProperty(m.getProperty(defaultNS, "hasBuyer"), john);
   int count = 0;
   while (junk.hasNext()) {
       count++;
       junk.next();
   }
  System.out.println("John has purchased " + count + "item(s)");
   // Now, what has John bought? (using SPARQL)
String queryString = "PREFIX rdf: <"m.getNsPrefixURI("rdf")"> " +
// PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
     "PREFIX rdfs: <" m.getNsPrefixURI("rdfs") "> " 
// PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>   

     "PREFIX store: <" defaultNS "> "
// PREFIX store: <http://www.example.com/storeOnt#>
       "SELECT ?label "  // SELECT ?label
       "WHERE { "        // WHERE {
        "       ?trans rdf:type store:Transaction . "
//         ?trans rdf:type store:Transaction .
        "       ?trans store:containsItem ?item . " 
//         ?trans store:containsItem ?item .
                ?item rdfs:label ?label . " 
//         ?item rdfs:label ?label .  " } ";
//       }
// Take the SPARQL query and, using Jena's ARQ library for SPARQL
// build and execute the query
   Query query = QueryFactory.create(queryString) ;
   QueryExecution qexec = QueryExecutionFactory.create(query, m) ;
       try {
         ResultSet results = qexec.execSelect();
         System.out.println("John has bought:");
         while (results.hasNext())
         {
           // Print out each item's label, stripping off 
           // the XSD type information
      QuerySolution soln = results.nextSolution();
      String labelString = soln.getLiteral("label").toString();
      int index = labelString.lastIndexOf("^^");
      System.out.println("   " + labelString.substring(0, index));
      }
    }
   finally {
  qexec.close();
 }

The resulting output is:

John has purchased 1 item(s)
John has bought:
    High-wolf shiny toaster

Our queries are basic and don't exploit all the graph properties—but they easily could. Integrating other graphs would be straightforward. If the graphs were based on a similar representation, we would not need to make any changes to our program. If the graphs were based on a different representation, we have two choices. We could make program changes similar to objects or, better yet, use a rule language such as the Semantic Web Rule Language (SWRL) to align the differences. For example, a rule could take advantage of the OWL's equivalentClass construct. equivalentClass equates classes ("Automobile equivalentClass Car") while OWL's sameAs construct equates instances ("James sameAs Jim"). A rule representation maintains separation between the KR and its various translations for other representations. And with all this fun, we are not even touching on advanced constructs such as inference and advanced queries. This is only the beginning.

"Hey, that's a lot of code", you say and, well, you are right. Much of this is structural and could be contained in an encapsulated programming class. Alternatively, all this data creation could be serialized in a file and merely read into the application, but this example provides a clearer handle as to what is happening behind the mirror. What you paid for in keystrokes pays off as we enter the agile part—the evolution.

Expanding and Evolving

Now that our store is a booming success, we want to improve our post-sales tracking. Instead of just knowing that a transaction has occurred and something has been sold, we want to know which items are perishable and those that are not. At first, we should use objects and extend PurchasableItem with a new subclass, such as PurchasablePerishableItem. However, if we need perishable items elsewhere in our system that have nothing to do with purchasing (handling returned perishable items, archive of previously sold perishable items, and so on), then we would need to duplicate this class under another superclass (not all perishable items are always considered "purchased"). With graphs, we extend the containsItem property to containsPerishableItem and create a new PerishableItem class, which is not subclassed to PurchasableItem.

It's a little strange at first thinking about having subproperties, since a property is visually interpreted as a link between two nodes. Returning to the concept of triples, it becomes more manageable. The following is a declaration of our new subproperty and class in abbreviated RDF/XML:


<rdf:Description rdf:about="#containsPerishableItem">
   <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#ObjectProperty"/>
   <rdfs:subPropertyOf rdf:resource="#containsItem"/>
</rdf:Description>
   ...
<owl:Class rdf:ID="PerishableItem">
   <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
</owl:Class>

We can now sell that rotten lemon that has sat in the refrigerator for so long:

Resource lemonSale = m.createResource(defaultNS + "lemonSale");
  m.add(lemonSale, RDF.type, m.getResource(defaultNS + "Transaction"));
  m.add(lemonSale, m.getProperty(defaultNS, "containsPerishableItem"), lemon);
  m.add(lemonSale, m.getProperty(defaultNS, "hasBuyer"), john);
  m.add(lemonSale, m.getProperty(defaultNS, "hasSeller"), seller);

Now we can query for all perishable items sold (looking for all triples with property containsPerishableItem) or simply all items sold:


String queryString = "PREFIX store: <" + defaultNS + "> "          // PREFIX store: <http://www.example.com/storeOntInference#
      "SELECT ?item "  // SELECT ?item
      "WHERE { "       // WHERE {
      "       ?trans store:containsItem ?item ." ?  
      //      ?trans store:containsItem ?item .
      "      } ";      //       }

which returns:


<http://www.example.com/storeOntInference#shinyToaster>
<http://www.example.com/storeOntInference#lemon>

Several points should be clarified. Lemon's rdf:type value is inconsequential here; it is only perishable because it is part of the containsPerishableItem relationship. It is important to note that graphs won't return any compile-time or runtime errors, such as if I sell my gorilla suit online and incorrectly add it as part of a containsPerishableItem property. Our lemon instance automatically becomes an item involved in a transaction by using containsItem or any of its subproperties. This is the spirit of Web 2.0: Our store will forever sell items of all shapes and sizes, including those we never anticipated, always an agent of change. We can easily create new ways to query this knowledge to better understand and grow our business. Refactoring object code on a frequent basis to support such dynamic activity becomes burdensome. Realistically, we can never plan for all the different items our clients will buy and sell; graphs let us better deal with such uncertainty. Scalability, flexibility, and ease of integration are easily met using the graph paradigm, for the intelligence is in the data and not the code.

Conclusion

Now that you have a basic foundation of graphs, we hope you'll join us in expanding on graph possibilities including graph design, powerful queries, inference, alignment between graphs, distributed graphs, mapping to web services, and much more.