Big (Data) Impressions from Strata NYC


There were over two-dozen vendors at Strata in New York City this week and I managed to sit down and talk with executives from just about each one. While I’m still in the process of digesting all the data they provided me for a Wikibon piece next week, I wanted to share some of my impressions with you now while they’re fresh in my mind.

First off, there’s no consensus, at least among the vendor community, about the level of maturity when it comes to the Big Data market.

Most of the younger players in the market, particularly the smallish, open source start-ups focused specifically around Hadoop, tend to want to discuss the technical aspects of their products. Companies like Datameer and MapR are focused on how their products improve the performance, speed and manageability of the Big Data stack.

Others, particularly the more well-established vendors like EMC Greenplum and LexisNexis HPCC Systems, believe we’ve moved past the “What is Big Data?” stage and on to the “How can Big Data help my business?” stage. These vendors are less interested in talking about the ins-and-outs of the technology and more interested in explaining to business people how Big Data can address specific business problems and bring immediate business value.

Then there are a few vendors that are straddling the line. Cloudera, in particular, is slowly transitioning its messaging away from highlighting the technical features that set its Hadoop distribution apart to identifying and communicating real-world Big Data use cases for traditional enterprises (i.e. enterprises other than the Facebook’s, Google’s and LinkedIn’s of the world.)

It’s also fascinating to watch how the different Big Data vendors relate to one another. There’s a complex web of relationships between all theses vendors and as the ecosystem becomes more crowded, alliances are forming and battle lines are being drawn around the various approaches to Big Data.

Roughly speaking, within the Hadoop community, there’s the fiercely open source camp led by Cloudera versus the (more-or-less) proprietary model being pursued by EMC Greenplum and MapR. Then there’s a myriad of smaller players that focus on particular modules of the Hadoop stack vying for position among one another and trying to determine which of the two Hadoop approaches (open v. proprietary) to align themselves with.

In the Hadoop delivery layer, more established vendors like data visualization specialist Tableau, which has been in the data visualization game since before the term Big Data was even coined, are positioning themselves as natural compliments to Hadoop and other Big Data approaches. They are facing off against start-ups like Platfora and Datameer that believe new approaches to data visualization and business intelligence are needed for Big Data.

But Hadoop isn’t the only Big Data game in town. LexisNexis HPCC System’s eponymous product, while sharing some characteristics with Hadoop, is a proprietary-developed alternative to Hadoop that the company just made open source. HPCC’s CTO Armando Escalante told me HPCC, which has been in development for over ten years, is a more mature technology and a complete Big Data stack.


One thing that virtually all of the vendors agreed on is that there’s a Big Data skills shortage and that services will play an important role in filling the void. While there were dozens of extremely talented data scientists and Hadoop developers in attendance at Strata, the reality is that most traditional enterprises don’t have internal teams of such people dedicated to operationalizing Big Data. As far as I can tell, EMC Greenplum has the most robust Big Data services offerings, incorporating vertical experts with EMC’s own team of data scientists. There was also some talk of Big Data as a Service. Delivering Big Data processing and analytics from the cloud packaged with services could prove a compelling way to make Big Data accessible to SMBs.

I’ll have more thoughts on Strata coming soon. If you were at the show, what did you make of the various vendors and approaches on display? Who do you think has the most compelling Big Data approach at this point in the market’s development?