4 Big Data Stories You May Have Missed This Week


Between Strata, Mobile World Congress and IBM PartnerWorld it was a busy news week. The SiliconAngle’s show theCube broadcast live from StrataConf and our distributed team covered much of the news from the event, but here are a few big data stories that we haven’t covered yet.

ReportGrid Rebrands as Precog, Wants to Be Heroku for Big Data

Precog is a private company that offers a big data analytics platform for developers. Formerly known as ReportGrid the company wants to be Heroku for data driven applications. Precog offers:

  • A data storage API for capturing structured and unstructured data from relational databases, Apache Hadoop and other sources.
  • A marketplace of add-ons for enriching existing data with additional data such as geolocation, demographics, sentiment analysis and social profiles.
  • A data analysis API for using this data to generate real-time intelligence reports, recommendations, probability calculations and more.

RainStor Partners with Real-Time Hadoop Company HStreaming and Telecom Company Anritsu

Earlier this year RainStor announced a version of its database that runs natively on Hadoop, enabling its users to query data stored in Hadoop with both SQL and MapReduce. I noted at the time that one of the limitations of both RainStor and Hadoop is that neither works as a streaming/complex event processing (CEP) solution. So the partnership between Rainstor and HStreaming, a company that adds real-time data streaming to Hadoop, makes perfect sense. The two companies will offer a joint solution that enables users to efficiently transfer data between HStreaming and RainStor,

RainStor also announced a partnership with Anritsu, a telelcommunications service provider, to bring RainStor’s database into Anritsu’s monitoring system Masterclaw. You can find more details here.

HStreaming Partners with Microsoft to Bring Streaming to Azure

Speaking of HStreaming, the company also announced a partnership with Microsoft this week to bring HStreaming’s service to Hadoop on Azure. The service is available now to current participants in Microsoft’s Technology Adoption Program (TAP) and Microsoft’s Community Technology Preview Program (CTP). HStreaming is also available for Hadoop implementations from Apache, Cloudera, MapR, Amazon EMR, Hortonworks, EMC Greenplum and IBM.

Hortonworks Partners with MarkLogic and Talend

Hortonworks, the Hadoop services company spun-out from Yahoo last year (and a Microsoft partner helping bring Hadoop to Windows), has another strategy for adding real-time analytics to Hadoop: a partnership with MarkLogic. Via a Hadoop connector for MarkLogic, customers will be able to MarkLogic for real-time data analysis of data stored in Hadoop. According to MarkLogic’s announcement “The two companies will also develop reference architectures for MarkLogic-Hadoop solutions and further align their product roadmaps.”

Hortonworks also announced a partnership with data integration company Talend, one of our open source companies to watch in 2012. Talend just launched its Talend Platform for Big Data, which will be bundled with Hortonworks Data Platform. The new Platform for Big Data provides a graphical interface for managing data integration between sources such as HDFS, HBase, Sqoop and Hive, plus a set of data quality and management tools.


Real-time analytics continues to big theme, as does usability of Hadoop. In his Cube interview Nathan Marz, the Twitter engineer in charge of the open source data stream processing system Storm, explained the difference between streaming and Hadoop’s batch processing clearly: Hadoop is for analyzing all your data. Stream processing systems, like Storm, are for analyzing data as it comes in.

“Delivering real-time analysis on large volumes of unstructured, streaming data is the elusive white whale of the Big Data industry,” says Wikibon analyst Jeffrey Kelly. “Such capabilities, once developed, will provide developers the opportunity to build smarter, highly reactive applications and systems.”

One of the big questions that remains is whether these real-time systems will be delivered as (relatively) usable product or those that will require extensive professional services to install, configure and even maintain. The tension between delivering a usable product and monetizing through professional services was discussed by Zettaset CEO CEO Jim Vogt in his interview on theCube.