In-memory data grid (IMDG) specialist ScaleOut Software has released its hServer IMDG product to enable Hadoop analysis of "grid-based" data.
- The People Problem: Cyber Threats Aren't Just a Technology Challenge
- Rogue Wave Tools and Libraries for Big Data
- How to Transform Paper Insurance Documents into Digital Data
- Developing a User-Centric Secure Mobile Strategy: It's in Reach
NOTE: By way of a definition: (in-memory) data grids combine distributed caching with in-memory analysis and management tools to provide a solution for managing fast-changing data in a server farm, compute grid, or in the cloud. This technology typically features powerful APIs for data access, query, and analysis along with supporting management tools.
ScaleOut hServer includes a specialized version of ScaleOut's IMDG plus open-source API libraries to give Hadoop programs access to fast-changing, live data held in hServer's IMDG.
ScaleOut hServer enables the storage of live data in an IMDG, where it can be updated directly by applications using hServer's APIs while simultaneously being accessed by Hadoop programs for analysis. Unlike conventional Hadoop usage, which analyzes static data sets, the ability to continuously perform Hadoop MapReduce analysis on live data enables important trends to be spotted as they occur.
"While it's a powerful platform for analyzing large, static data sets, Hadoop has always been limited by its inability to perform analytics on live data," said Bill Bain, ScaleOut Software CEO.
Hosting data in hServer's in-memory data grid reduces access latency in comparison to the use of file systems and database servers to hold data sets for analysis by Hadoop.
ScaleOut hServer also provides a transparent distributed cache for HDFS data, using memory-based storage to eliminate file I/O and accelerate data access for Hadoop's MapReduce. Tests have demonstrated an 11X reduction in access latency for benchmark data sets. ScaleOut hServer automatically retrieves and stores HDFS data as key/value pairs in its IMDG, enabling subsequent analyses to bypass HDFS and access data directly from the distributed cache. Only a two-line code change is required for a Hadoop program to use hServer as an HDFS cache.
A recent survey commissioned by ScaleOut Software determined that 93% of respondents felt that their organizations required or would benefit from real-time data analytics on the Hadoop platform. In addition, 83 percent of Hadoop users run analyses multiple times on the same data set, and more than 61 percent of the data sets being analyzed are smaller than 10TB. This survey data supports the need for products like hServer that help Hadoop perform real-time analytics.
ScaleOut hServer will be available in both a free community edition and in several commercial editions. The community edition enables up to a four-server combined Hadoop/hServer grid for analyzing memory-based data sets of up to 256GB.
NOTE: Another player in the grid-based data game is GridGain. The company says that In-Memory Data Grid is the core technology behind its capability to process large data sets with low latency in real time context.
"Scaling from a single computer to terabytes of data and thousands of nodes, GridGain In-Memory Data Grid technology provides capability to parallelize the data storage by storing partitioned data in in-process memory — the closest location the data can theoretically reside in relation to the application using it," says the company.