UPDATED 10:00 EDT / JUNE 05 2012

NEWS

Automation and Easier Aggregation in Hadoop Clusters Signals Data as a Service Trend

Yesterday I wrote about Cascading 2.0, an alternative to MapReduce. The application framework, managed by Concurrent, allows for developers to develop “Cascading,” big data apps using high-level scripting languages. The apps then get scheduled to run across a Hadoop cluster.

Also yesterday,  HP executives presented their case for integrating Hadoop with Autonomy and HP Vertica, its impressive analytics technoloogy.

In both the news from HP and Concurrent, executives often referred to “aggregation,” as what serves as a priority in developing big data systems. It’s becoming clear why. Aggregation represents the next phase on the road to data as a service.

HP executives described how customers now talk about “data lakes,” where all data flows for analysis. With Autonomy, the data feeds into its analysis for filtering and then disrtributed to a Hadoop cluster.

I asked Autonomy Promote’s chief executive Rafiq Mohammadi how the integration might fit with Cascading 2.0. He said it’s not an either or situation. It’s simply an aggregation that could be executed through a REST-based API.

“Our entire strategy is to aggregate logic,” he said.

AWS: The Mega Aggregator

The Autonomy Intelligent Data Operating Layer (IDOL)integrating into Hadoop is similar to the way Amazon Web Services (AWS)  aggregates data for customers to shape into apps. It serves as the value for any number of data services.

It does account for AWS success with customers in the business of data. Customers can program apps through platform-as-a-service (PaaS) and run them through AWS Hadoop clusters.  Flightcaster did this and made its name for its accurate flight forecasting. Today, Cascading 2.0 gives the capability to more easily develop apps with aggregated data.  Thousands more data services will emerge as automation quickens the capability to access aggregated data.

Advances in automation and app development for deployment on Hadoop clusters signals the coming trend in data-as-a-service. PaaS environments and big data frameworks will serve as the foundation for automating the application process to access aggregated data resources.

It’s inevitable. The analytics tools are getting better and the frameworks are far more simole to set up.

But the next step  is aggregation. Once that is achieved, data can be shaped and used for competitive advantage.

 


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU