Pentaho has expanded its data integration software portfolio with support for what it lists as a "major expansion" of native big data sources, including the latest Hadoop distributions, NoSQL sources, as well as native support for several analytic databases and traditional OLTP databases.
The company says that its native connection to big data platforms makes it easier and faster than ever to analyze the enormous data volumes generated by today's organizations.
Speaking exclusively to Dr Dobb's Journal, VP of product management at Pentaho Jake Cornelius said, "Pentaho's goal with big data is to provide appropriate tooling that makes it easier for developers and application architects to build and manage data integration and Business Intelligence solutions with big data technologies like Hadoop, NoSQL variants, and high performance/scalable data warehousing platforms."
Claiming to have "recognized early" the complexity and diversity of big data, and the growing need to support its volumes, Pentaho now openly declares that it offers deeper and more comprehensive support for Big Data sources than any other BI vendor.
Aiming to provide some validity to those claims, Pentaho's Cornelius says that his company's current integration points provide a number of benefits to developers including:
- The ability to orchestrate execution of Hadoop related tasks (i.e., executing a Hive Query, Pig Script, or M/R job) as part of a broader IT workflow.
- The ability to setup dependencies, so if a step fails the job can branch down a recovery path or send a notification, or if it's a success it goes on to subsequent dependent tasks. Likewise it supports initiating several tasks in parallel.
- New integration for Pig — so that developers have the ability to execute a Pig job from a PDI Job flow, integrate the execution of Pig jobs in broader IT workflows through PDI Jobs, take advantage of our out of the box scheduler, and so on.
Technical Note taken from http://pig.apache.org/: Apache Pig is a platform for analyzing large data sets that consist of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.
"Hortonworks and Pentaho share a vision whereby Apache Hadoop becomes the de facto platform for storing, managing, and analyzing big data. We are focused on accelerating the development and adoption of Apache Hadoop and are excited to be working with Pentaho to further simplify the development and deployment of Big Data projects," said Eric Baldeschwieler, CEO, Hortonworks.
While traditional OLTP databases are typically not considered "dig data" platforms, Pentaho says it maximizes their performance and scalability through native SQL dialect generation for fast analytics, or native bulk loader integration for fast data integration. OLTP databases with native Pentaho support include: Apache Derby, Firebird, HyperSQL, IBM DB2, IBM Informix, Ingres, Interbase, Microsoft Access, Microsoft SQL Server, MySQL, Oracle, and PostgreSQL.