RainStor Runs Its Database Natively on Hadoop

RainStor logo

RainStor logo Today RainStor announced that it is releasing a version of its database that runs natively on Apache Hadoop called RainStor Big Data Analytics on Hadoop. This will enable users to query data stored in Hadoop using both SQL and MapReduce. Also, thanks to the advanced compression techniques users will be able to store more data in less space, reducing server overhead and reducing the time and complexity of backups, replication and other activities.

RainStor’s main product is a relational database and has focused mostly on providing cloud backups. It has a big client list, including household names like AT&T, Bank of America, Merck, Pfizer and more.

RainStor Big Data Analytics on Hadoop is not a connector – the database runs natively on top of Hadoop. Because it runs on the Hadoop stack, there’s no need to pipe data in and out of the Hadoop Distributed File System in order to run SQL or even MapReduce queries on it.

Here’s an illustration of how it works:

Rainstor illustration

Rainstor claims to be able to compress data by 40 fold, and also claims huge performance boosts in querying large datasets through its query optimization and filtering techniques.

One of the big advantages of this approach, other than the speed and compression, is that it enables users to choose between either SQL or MapReduce. That means that business analysts or data scientists not familiar with MapReduce can analyze data using a language they already understand. But serious Hadoop developers can still use MapReduce if they want.

One thing this doesn’t really do is streaming/complex event processing (CEP). RainStor CEO John Bantleman describes the product as a complement to CEP rather than a replacement.

RainStor Big Data Analytics on Hadoop addresses some of the big problems with Hadoop, namely the complexity of interacting with it, the time it takes to run MapReduce jobs on large datasets and the complexity of running large clusters. In many ways it reminds me of HPCC, which is an open source alternative to Hadoop that uses its own SQL-like language for . RainStor shows the strength of the Hadoop ecosystem. Companies ranging from HStreaming to Tresata are building on Hadoop rather than trying to replace it with something else.