VMware has announced Spring Hadoop, an integration of the Spring Framework and the Apache Hadoop platform designed to give developers an option to build distributed processing solutions with Apache Hadoop.
VMware has a habit of talking about the "new wave of data-driven applications" as if previous application breeds were in some way largely bereft of data-centricity. But if we take the firm's focus on so-called Big Data issues at face value, there is interest in VMware delivering a streamlined programming model that could make Spring a natural way to integrate Hadoop systems into the enterprise app landscape. "Spring Hadoop brings the benefits of Spring — simplicity, ease-of-use — to Hadoop by providing a comprehensive, lightweight framework that will allow developers to easily build solutions around the Hadoop platform," said the company.
The situation here is that (as you know) "data volumes" have undeniably grown. Further, "data access" choices in enterprise applications have grown exponentially (i.e., there is widespread secure access to the corporate data center via smartphones, tablets, laptops, and dedicated mobile devices of all kinds) — and this in many senses sums up the challenge brought about by Big Data as we know it today.
VMware reminds us that in answer to these new data challenges, Spring continues to focus on enabling enterprise Java developers to incorporate new data access patterns into their applications through the Spring Data projects.
Key aspects of Spring Hadoop include:
- Support for configuration, creation, and execution of MapReduce, Streaming, Hive, Pig, and Cascading jobs via the Spring container
- Comprehensive HDFS data access support through JVM scripting languages (Groovy, JRuby, Jython, Rhino, etc.)
- Declarative configuration support for HBase
- Dedicated Spring Batch support for developing powerful workflow solutions incorporating HDFS operations and all types of Hadoop jobs
- Declarative and programmatic support for Hadoop Tools, including FsShell and DistCp


