UPDATED 15:45 EDT / JUNE 06 2011

NEWS

Informatica and Composite Software Help Hadoop Get Connected

Hadoop, the open source framework for processing Big Data, needs to get connected. That is, most Hadoop installations are isolated from the larger IT infrastructure. Think data scientists working behind the curtain. To gain mainstream adoption, Hadoop must be integrated with existing data management platforms and enterprise applications.

That means data integration, and two recent releases bode well for connecting Hadoop to the larger enterprise.

First, data integration specialist Informatica has added Hadoop data connectors to its core platform that will allow users to move data and analytics results stored in the Hadoop Distributed File System, or HDFS, to internal systems. The new connectors, part of Informatica 9.1, make it easier, for example, to integrate data from HDFS to an on-premise data mart for further analysis.

Second, Composite Software announced it now supports SQL-based integration with Cloudera’s Distribution including Apache Hadoop. Composite takes a different approach to integration than Informatica called data virtualization. Data virtualization, also called data federation, involves integrating data from multiple source systems to a temporary, virtual data layer for analysis. The technique is often used to compliment less flexible enterprise data warehouse deployments.

“Composite Software’s SQL-based integration with Cloudera’s Distribution including Apache Hadoop (CDH) enables customers to benefit from using the most comprehensive and widely adopted distribution of Hadoop with a proven, enterprise data virtualization platform,” said Ed Albanese, head of business development at Cloudera.

Currently, integrating data into or out of HDFS requires significant expertise in MapReduce and Hadoop programming. As we’ve written before on SiliconANGLE and Wikibon.org, there simply aren’t enough highly trained data scientists and other Hadoop specialists to go around. For Hadoop to take off in the mainstream enterprise, new open source and commercial Hadoop add-ons are needed to make implementing and running Hadoop installations accessible to less highly-trained IT staff.

Of the two approaches, Composites’ in particular intrigues me. One of the foundations of Hadoop is distributed computing and doing away with the idea of a single, permanent ‘data temple’ inside the enterprise that houses all of a company’s most critical data. To me, this meshes well with data virtualization, which brings together disparate data for analysis as needed in a temporary, virtualized layer. With data virtualization, like with Hadoop, there is sometimes no need for a single data temple.

Overall though, both new HDFS data connectors from Informatica and Composite Software are yet two more small, incremental steps towards making Hadoop fit for mainstream adoption. It’s good news for users, but also for Cloudera, which offers the most mature commercial Hadoop distribution currently available. The easier it is to extract data from Cloudera Hadoop distros, the more appealing Hadoop becomes to potential new customers.

From my perspective, I think it’s going to be fascinating to watch how more “conventional” data integration vendors approach Big Data technologies like Hadoop. For independent data integration vendors, Hadoop is a great opportunity to develop a new line of business. In addition to Informatica and Composite, SnapLogic also recently released Hadoop connectors.

But for mega-vendors IBM and Oracle, for whom data integration is just one product of many, Hadoop could be viewed as a threat. So I wonder if Oracle and IBM add data connectors to HDFS to their respective data integration offerings and sell them as stand-alone products not bundled into larger DBMS and data analytic appliances? This begs the larger question, will the two mega-vendors embrace open source approaches to Big Data, as EMC Greenplum is in the process of doing, or will they develop or repurpose their own proprietary systems for Big Data jobs?

I think the former approach is the way to go. Sure, IBM and Oracle may not make as much money in the short-run, but building Big Data processing technology internally will take time. Depending on how quickly Hadoop matures, that’s time IBM and Oracle may not be able to afford.

 

 


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU