You Say You Want a (Big Data) Revolution?

Screen shot 2011-08-01 at 1.27.34 PM

Perhaps calling it a revolution is a bit hyperbolic, but Revolution Analytics is definitely making waves in the predictive analytics and data mining market.

The Palo Alto, Calif.-based start-up provides commercial add-ons and support services for R, the open source language for writing and executing advanced data analytics jobs. Its core platform, called Revolution R Enterprise, includes an intuitive graphical user interface for writing algorithms in R, which on its own can be quite complex (and differs significantly from writing similar analytics code for SAS or SPSS.)

The goal, according to Revolution CEO (and co-founder of SPSS) Norman Nie, is to bring the power of R to every-day business users. That’s no small feat. R has been the language of choice for many statisticians and other experienced data miners for years, but an easy-to-use language it is not.

The company also has to compete with two well-entrenched, deep-pocketed analytics stalwarts, the aforementioned SAS and SPSS. Combined, the two vendors control over half of the advanced data analytics market (which includes predictive analytics and statistical data mining.) Both vendors have also invested heavily in making at least some of their core analytic functions easier to use, with SAS in particular having added certain analytics functions to its mainstream business intelligence offering.

Betting on Big Data

Revolution, as a commercial open source software company, has the edge on SAS and SPSS when it comes to cost of deployment (see Doug Henschen’s recent article comparing the three on price). But in addition to cost and ease-of-use, Revolution is also counting on it’s ability to play in the Big Data world as a key competitive differentiator.
RevoScaleR is an add-on to Revolution’s enterprise platform that enables it to scale to “terabyte-class” deployments. It includes pre-packaged analytic algorithms common in Big Data Analytics scenarios, including summary statistics, linear regression and binomial logistic regression and crosstabs. It is also compatible with HDFS, the Apache Hadoop storage layer, and NoSQL databases common in Big Data deployments.

Nie and company are also working to leverage the growing trend of in-database analytics to extend its reach. Revolution has partnered with IBM to deliver R inside the IBM Netezza data warehouse. The IBM Netezza High Capacity Appliance can scale to up to ten petabytes, according to IBM. Rather then transferring all that data to the Revolution platform for advanced analytics (which isn’t feasible even if one were so inclined), Revolution’s platform essentially lives inside the appliance, significantly speeding up analysis.

R and Hadoop

When it comes to Hadoop, Revolution may be in an advantageous position compared to its competition. As a commercial provider of open source software, Revolution is at home in the open source world of Hadoopand its myriad sub-projects. Just as Cloudera relies in part on the contributions of the community to improve its Hadoop distribution, Revolution likewise takes advantage of the R open source community and its contributions. As the two communities begin to overlap more, it’s only a matter of time before more well-integrated, powerful R-based analytics functions for Hadoop emerge.

The in-database analytics picture is a little cloudier, however. As mentioned, Revolution partners with IBM to integrate its platform with the Netezza family of appliances. As far as I can tell, IBM is to-date Revolution’s only partner on this front. But IBM has advanced analytics technology of its own in the form of SPSS, which it acquired in 2009. I suspected that this partnership could be on shaky ground, but a Revolution spokesperson assured me that IBM is committed to the deal as R provides certain functionality that SPSS does not. Still, Revolution needs to expand its in-database partnerships to other MPP data warehousing vendors (perhaps EMC Greenplum and/or HP Vertica) to solidify this part of its Big Data strategy.

On the whole, I like Revolution’s approach. Data volumes are growing like never before, and the popularity/need for Big Data approaches like Hadoop and MPP data warehousing are likewise on the rise. If Revolution can use its value-add technology and support services become the de facto Big Data advanced analytics platform/tool, it has a real chance to give SAS and SPSS a run for their money.