Channels ▼

Ken North

Dr. Dobb's Bloggers

Movement on the Big Data Front

April 08, 2010

The enthusiasm for Big Data applications has us putting persistent data solutions under a microscope these days. It must be noted that although Big Data applications involve operations with large data sets, their function can vary from online transaction processing to analytics to semantics-driven information retrieval. And an application might be using a distributed key-value store, a row- or column-order store, a set store, a triples store or some other technology.My previous blog posting about David Child's extended set theoretical model caught the attention of Dr. Hasan Sayani, the Graduate School Program Director of Software Engineering for UMUC. Dr. Sayani has long been acquainted with the extended set theoretical model and read Childs' most recent paper about set-store architectures. I sense frustration when Dr. Sayani wrote:

I have been following Dave since the 60's when I was at Michigan and though I see the value in what he has accomplished I have failed to ignite any attention among those who might benefit from it!

More Insights

White Papers

More >>


More >>


More >>

Perhaps Dr. Sayani will be cheered by the recent developments on the product front and in the blogoshphere. In a few years, we might able to point to Big Data providing the spark for adoption of extended set theory (XST). Certainly the blogosphere has shown interest in XST, including Jerome Pineau's "Big Honking Databases" blog and Ron Jeffries' XProgramming blog.

Pineau has written about business intelligence and his success with extended set theory as implemented by the XSPRADA engine. In October 2009, XSPRADA Corporation became Algebraix Data Corporation, with an analytic database product named A2DB based on new patented technology:

Algebraix Data Corporation today announced that it is has been granted U.S. Patent No. 7,613,734, for its systems and methods for providing data sets using a store of algebraic relations.

Algebraix is one of the more recent entries in the analytic database race that's fueled in part by the interest of venture capitalists, by established companies offering new products, such as Oracle Exadata, and by advocates of open source software, such as HBase, Hadoop and HadoopDB.

The BI community represents only one slice of the Big Data user pie. The piece that represents the Linked Data / Web 3.0 / Semantic Web community isn't as large, but that community is growing. In March 2010, Oxford University and the University of Southampton announced a new Institute for Web Science will lead the way in Web 3.0 development with £30 million in funding from the UK government:

Web 3.0 will take the web to a whole new level by publishing data in a linkable format so that users and developers can see and exploit the relationships between different sets of information.

Cassandra, Hadoop Map/Reduce, Greenplum and other engines come up frequently in discussions about Big Data. But if Sir Tim Berners-Lee has his way, we'll be having more discussions about solutions for Really Big Data.

The W3C Resource Description Framework (RDF) defines a triples data model that's gained acceptance for Semantic Web applications, Linked Data and building out Web 3.0. There are a variety of data stores capable of handling billions of RDF triples, including OpenLink Virtuoso, Ontotext BigOWLIM, AllegroGraph, YARS2, and Garlik 4store.

Raytheon BBN Technologies has approached the triples store problem from the perspective of using a cloud-based technology known as SHARD (Scalable, High_Performance, Robust and Distributed). SHARD is a distributed data store for RDF triples that supports SPARQL queries. It's based on Cloudera Hadoop Map/Reduce and it's been deployed in the cloud on Amazon EC2. SHARD uses an iterative process with a MapReduce operation executed for each single clause of a SPARQL query. According to Kurt Rohloff, a researcher at Raytheon BBN, SHARD "performs better than current industry-standard triple-stores for datasets on the order of a billion triples."


Part 1: "Sets, Data Models and Data Independence"

Part 2: "Laying the Foundation: Revolution, Math for Databases and Big Data"

Part 3: "Information Density, Mathematical Identity, Set Stores and Big Data"

Related Reading

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.