Ken North

Dr. Dobb's Bloggers

Supercomputing, The Cloud, Big Data, and NoSQL

January 13, 2012

If you count yourself among the informed members of the software and computing community, you're undoubtedly aware of NoSQL, "Big Data", cloud computing, and supercomputing. Sometimes technology that has become trendy is a branch on an evolutionary tree; other times it's a revolutionary departure from long-established status quo.

The arrival of new technology often rekindles the pervasive debate over the merits of "tried-and-true" versus "new and improved". The latter often introduces new words in our lexicon, with recent examples being Big Data, NoSQL, and cloud computing. Supercomputing has been with us for a while but there have been significant strides in 2011, including IBM Watson, Tianhe-1A, and an Amazon virtual supercomputer.

IBM Watson can process 200 million pages of text in 3 seconds. (How's that for having enough capacity for big data workloads?) China claimed the supercomputer crown with Tianhe-1A and its capacity to perform 2.5 thousand trillion calculations per second. Tianhe-1A is 50% faster than the XT5 Jaguar at Oak Ridge National Laboratories. One of the more interesting approaches to solving large-scale computing problems is the Amazon virtual supercomputer. This was an ad hoc solution for an Amazon EC2 user, a pharmaceutical company that spent $1,279 per hour to rent 30,000 cores. That virtual supercomputer had enough capacity to rank 42nd on the list of the top 500 supercomputers.

My previous Dr. Dobb's blog post discussed the surge of interest in the cloud and Big Data (Terabytes to Petabytes: Reflections on 1999-2009).

Having enormous computing and storage capabilities is undoubtedly a prime factor in the growing importance of Big Data. We have capacity for analytics and data visualization that was unheard of a decade ago, including the ability to process large data volumes from disparate sources. These data sources include SQL and other structured data (click streams, web logs, RFID and sensor data, high-speed, low-latency data feeds), and a host of unstructured data, such as Tweets.

The desire to build social networks and web-scale applications has led to being able to support millions of users, and store and process information about hundreds of millions. The availability of seemingly unlimited capacity has generated enthusiasm for Hadoop and other solutions for processing large data sets. The major players in the SQL database space, for example, are integrating Hadoop with their database product line.

These new computing and storage requirements have revived, in some circles, a debate over whether to supplant tried-and-true languages, architectures, and database solutions. Important topics in recent debates have concerned attributes and capabilities of different database solutions. The topics in focus have included horizontal scalability and sharding, ACID versus BASE properties (consistency), schemas and type support, granularity of encryption, and query methods.

One of the more interesting debates is about types, schemas, type-less programming, and schema-less databases. I'll take a closer look at these issues in an upcoming blog post.





Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

DrDobbs encourages readers to engage in spirited, healthy debate, including taking us to task. However, DrDobbs moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. DrDobbs further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Best of the Web

What the New iPad and iOS 5.1 Mean for Developers

The new display is gorgeous. But local storage for HMTL5 is currently broken on the new iPad and performance of some apps is slower. Here's a deep dive into the issues, including benchmarks and analysis.

Quick Read

Triple Buffering as A Concurrency Mechanism

Triple Buffering is a way of passing data between a producer and a consumer running at different rates. It ensures that the consumer sees only complete data with minimal lag.

Quick Read

Embedding GDB Breakpoints in C Source Code

Have you ever wanted to embed GDB breakpoints in C source code? Something like this:
printf("Hello,\n");
EMBED_BREAKPOINT;
printf("world!\n");

Quick Read

Writing Kernel Exploits

Why attack the kernel? Because it has a huge attack surface with potential for very interesting bugs. This presentation (pdf) takes a code-level dive into recently reported Linux-kernel exploits.

Quick Read


More "Best of the Web" >>



Video

Enabling People and Organizations to Harness the Transformative Power of Technology