UPDATED 10:00 EDT / JUNE 05 2012

NEWS

Automation and Easier Aggregation in Hadoop Clusters Signals Data as a Service Trend

Yesterday I wrote about Cascading 2.0, an alternative to MapReduce. The application framework, managed by Concurrent, allows for developers to develop “Cascading,” big data apps using high-level scripting languages. The apps then get scheduled to run across a Hadoop cluster.

Also yesterday,  HP executives presented their case for integrating Hadoop with Autonomy and HP Vertica, its impressive analytics technoloogy.

In both the news from HP and Concurrent, executives often referred to “aggregation,” as what serves as a priority in developing big data systems. It’s becoming clear why. Aggregation represents the next phase on the road to data as a service.

HP executives described how customers now talk about “data lakes,” where all data flows for analysis. With Autonomy, the data feeds into its analysis for filtering and then disrtributed to a Hadoop cluster.

I asked Autonomy Promote’s chief executive Rafiq Mohammadi how the integration might fit with Cascading 2.0. He said it’s not an either or situation. It’s simply an aggregation that could be executed through a REST-based API.

“Our entire strategy is to aggregate logic,” he said.

AWS: The Mega Aggregator

The Autonomy Intelligent Data Operating Layer (IDOL)integrating into Hadoop is similar to the way Amazon Web Services (AWS)  aggregates data for customers to shape into apps. It serves as the value for any number of data services.

It does account for AWS success with customers in the business of data. Customers can program apps through platform-as-a-service (PaaS) and run them through AWS Hadoop clusters.  Flightcaster did this and made its name for its accurate flight forecasting. Today, Cascading 2.0 gives the capability to more easily develop apps with aggregated data.  Thousands more data services will emerge as automation quickens the capability to access aggregated data.

Advances in automation and app development for deployment on Hadoop clusters signals the coming trend in data-as-a-service. PaaS environments and big data frameworks will serve as the foundation for automating the application process to access aggregated data resources.

It’s inevitable. The analytics tools are getting better and the frameworks are far more simole to set up.

But the next step  is aggregation. Once that is achieved, data can be shaped and used for competitive advantage.

 


A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.