Strive for Tight Integration Between Hadoop, Complimentary Big Data Approaches


There was a ton of great content at Hadoop World 2011 this week, but one point that struck me was the number of vendors present that were not, strictly speaking, part of the core Hadoop ecosystem.

They included Teradata Aster Data, Informatica, HP Vertica, Attivio and, inexplicably, Oracle.

The main take-away on this point is that Hadoop and other Big Data technologies, namely MPP-style data warehouses, analytic development platforms and data integration platforms, are complimentary to one another. In fact, working together, each makes the others stronger and more valuable to the business in many use-case scenarios.

Consider the case of JPMorgan Chase. In his keynote address on Monday morning, Larry Feinsmith, the company’s Managing Director in the CIO Office, dropped this little nugget on attendees: JPMorgan Chase has over 30,000 databases and 15,000 applications spread across the globe. The company is in the process of building what it calls a common data platform based on Hadoop to bring all its customer data together in one place.

That’s a lot of data and a lot of data movement. The company is leveraging Informatica’s data integration platform to transform and load data into its Hadoop cluster. There’s no way JPMorgan Chase could move that much data around by hand-coding custom integration jobs. It just wouldn’t scale.

In the video below, Informatica CTO James Markarian talks about the intersection of Hadoop and data integration. Specifically, Markarian talks about Informatica’s work with JPMorgan Chase and the vendor’s experience with unstructured data integration.

Watch live video from on

Over dinner during the show, I also spoke with a developer at a New York City-based social gaming company. He explained to me that the company is using Hadoop for large-scale data processing and storing, but it needed a faster way to analyze high-velocity user data. The gaming company recently purchased Vertica, the first MPP data warehouse vendor to develop a Hadoop connector, to help in that effort.

Colin Mahony, Vertica’s Vice President of Product Management and Business Development, went live inside theCUBE and provides this good explanation of how MPP data warehouses compliment Hadoop:

Watch live video from on


The bottom line is that Hadoop is not an island. There are numerous use cases where complimentary technologies make Hadoop an even more useful and valuable platform. The business should consider the entire Big Data landscape when devising Big Data strategies, while shared-services IT organizations should strive for tight integration between Hadoop and complimentary Big Data technologies where possible.