Big data application company Concurrent has introduced Pattern, a free and open source "scoring engine" for data professionals to use when deploying machine-learning applications on Apache Hadoop.
White PapersMore >>
- How to Improve Customer Analytics: Best Practices
- Optimize Your SQL Environment for Performance & Flexibility
The software is designed for users such as data scientists and systems analysts to make use of existing intellectual property (IP) in predictive models, existing investments in software tooling, and the core competencies of existing analytics staff to run big data applications from machine-learning models using Predictive Model Markup Language (PMML) or through a simple programming interface.
While Hadoop is (arguably) becoming the "tool of choice" for big data analytics, many commenters have stressed that it is not an easy technology to use — it also needs to integrate with existing data management and analytics systems, so this has created a real barrier to comprehensive Hadoop adoption.
NOTE: PMML is the standard export format for tools, such as R, MicroStrategies, and SAS.
Using Pattern, analysts and data scientists familiar with these technologies can run predictive data models at scale and integrate ETL, data preparation, and predictive analytics in the same application to potentially reduce development time and unlock accessibility to large Hadoop data sets.
"By leveraging the Cascading framework, enterprises can apply Java, SQL, and predictive modeling investments, and combine the respective outputs of multiple departments into a single application on Hadoop. When combined, Cascading, Lingual, and Pattern close the modeling, development, and production loop for all data-oriented applications. The combination of the three is the application ensemble for further enabling enterprises to drive differentiation through data," said Chris Wensel, CTO and founder, Concurrent, Inc.