Channels ▼
RSS

Open Source

Cloudera Impala: Processing Petabytes at The Speed Of Thought


Apache Hadoop enterprise player Cloudera has launched a real-time query engine for Hadoop, dubbed Impala. Cloudera Enterprise will now effectively be the first big-data management solution that allows batch and real-time operations to be performed on unstructured or structured data all within the same scalable system.

More Insights

White Papers

More >>

Reports

More >>

Webcasts

More >>

The suggestion is that organizations will now be able to process data at petabyte scale and, on the same system, interact with that data in real time to deliver what Cloudera likes to call "speed-of-thought" insights.

NOTE: Cloudera Impala is an Apache-licensed, real-time query engine for data stored in HDFS (Hadoop Distributed File System) and HBase. Cloudera Enterprise RTQ (Real-time Query) provides the management and support needed to operate Cloudera Impala in production environments.

"Mainstream enterprise adoption of Hadoop will inevitably raise expectations," said Tony Baer, principal analyst for Ovum. "Enterprises have grown accustomed to interactive querying and on-the-spot analytics with their existing data warehousing and BI infrastructures and will expect no less of Hadoop. With a real-time query capability powered by its new Impala engine, Cloudera is striving to level the playing field in performance and accessibility with massively parallel SQL platforms."

The implication is that developers are going to have learn new tricks to take advantage of real-time big data crunching made possible by Impala — and Impala queries are generally short lived, with smaller, focused result sets. Additionally, Impala queries operate on data sets of any size in HDFS.

Impala is "especially well suited" to use cases where real-time queries and speed are essential. But while many developers will be familiar with Hive and Pig, Impala uses its own daemons that are spread across the cluster for queries. Furthermore, Impala does not leverage MapReduce, allowing Impala to return result in real time.

"We have already seen high levels of interest in, and adoption of, Hadoop by enterprises for low-cost storage and transformational processing of large volumes of data, but have argued that for Hadoop to gain more adoption for analytic workloads we need to see analytic tools taking full advantage of Hadoop’s scalable parallel processing architecture," said Matt Aslett, research manager, data management and analytics, 451 Research. "Cloudera Enterprise RTQ and Cloudera Impala look to be a significant step in enabling enterprises to take advantage of existing SQL skills and tools to realize the potential of real-time analytics against large volumes of structured and unstructured data stored in Hadoop."

NOTE: Apache Hadoop started as an offline, batch processing system. Subsequently, Hadoop was extended to service more interactive online workloads. First among these was HBase, the distributed, tabular data store.

Cloudera Impala introduces what is essentially a scalable, distributed query engine to the Hadoop ecosystem. The technology was developed by lead architect of the Impala project, Marcel Kornacker, who previously helped build the query engine for the F1 project at Google.

"Apache Hadoop has already transformed the industry, unlocking value from Big Data for enterprises around the world," said Mike Olson, CEO of Cloudera. "Until now, enterprises had to limit the work they did with Hadoop because batch-mode processing using MapReduce was just too slow for some business problems. With today's release of Cloudera Enterprise Real-Time Query powered by Impala, we solve that problem. Cloudera Impala complements MapReduce and is the latest addition to our one hundred percent open source Big Data platform."

"You can now store all your data in Hadoop and use the same hardware to do both powerful analytics and run real-time queries using industry-standard tools and the SQL language," added Olson.

NOTE: Cloudera remains a leader in open-source contribution across Hadoop and supplemental projects such as Hive, Flume, Search, and Impala, and is the single biggest contributor to Hadoop-related projects with over 50 project committers, PMC members, and code contributors to Apache.


Related Reading






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

Video