Apache Hadoop software, support, and services company Cloudera has released the Cloudera Developer Kit (CDK) at its first developer kit created for the firm's own distribution of Hadoop (CDH).
White PapersMore >>
- Mobile Content Management: What You Really Need to Know
- How to Transform Paper Insurance Documents into Digital Data
The Cloudera Developer Kit includes APIs, tools, and documentation to simplify most common tasks in open-source Hadoop.
NOTE: The Palo Alto headquartered firm, which recently opened offices in London's TechCity, has explained in the past that more than 50% of its engineering output is donated upstream to the various Apache-licensed open source projects — including Apache Hive, Apache Avro, Apache HBase, and so on.
The firm points out the reality of the current situation with Hadoop; i.e., the process of building and managing Hadoop environments remains complicated and time-consuming for many developers without specialized training.
CDK includes a collection of API libraries, tools, example code, and documentation that aim to help simplify the most common tasks when working with Apache Hadoop.
NOTE: Like CDH, CDK is 100% free, open source, and licensed under the same Apache Software License, allowing developers to use the code in any way they choose across existing commercial code bases or in any open source project.
"At Cloudera we are not just Hadoop providers; we're also consumers who know first-hand the challenges developers can face when working with Hadoop," said Eric Sammer, engineering manager, Cloudera. "The new Cloudera Development Kit is one of the many ways we're sharing our expertise with the community. First-time Hadoop programmers will find that CDK walks them through each step of the process, enabling them to get up and running on the platform quickly, while more-experienced developers will appreciate the flexibility of CDK to swap out different components for a completely customized experience. By making Hadoop more accessible, we are excited to help an even broader range of organizations get more value out of their data."
CDK is modular in its approach to flexibility, enabling developers to pick and choose the pieces they want to use, while freely substituting code of their own. For Java developers using tools like Maven, artifacts are available from the Cloudera Maven Repository for project integration.
The first module included in CDK is the CDK Data module, a set of APIs that simplifies working with datasets in Hadoop filesystems, such as HDFS and the local filesystem. Cloudera will continue to add new modules to CDK to extend its functionality and flexibility for developers. As CDK is a fully open source project, community contributions are also welcome.