UPDATED 15:26 EDT / JUNE 30 2014

DataStax becomes latest vendor to jump on the Apache Spark bandwagon

spark light sparkle fireThe limitation of historical data analysis as a forecasting tool is  spawning a rush by analytics providers to add real-time processing functionality to their offerings. DataStax, a company that sells an analytics suite based upon the Apache Cassandra NoSQL engine, is the latest to join the back.

Cassandra is an open-source database that’s used to store vast amounts of unstructured information across commodity servers and that is especially useful for processing high-volume, fast-moving workloads such as sensor data. The Santa Clara-based DataStax’s premium distribution provides several value-added features on top of Cassandra, including backup and recovery, monitoring and in-memory processing, which is a capability that was introduced with version 4.0 of the platform hot on the heels of a similar update from Oracle. Both vendors are touting up to 100 times faster performance compared to traditional analytics engines.

Now DataStax is upping the ante with release 4.5 of its namesake offering, which includes support for Apache Spark, an engine described by SiliconANGLE founder John Furrier as the “next big thing in Big Data.” The integration is the result of a May partnership with Databricks, a startup founded by the original creators of the technology that offers professional support and certification.

Analytics at the speed of lightning

 

Spark is an analytics engine that is said to be up to 100 times faster than MapReduce – the most widely used data crunching engine for Hadoop – when running in memory, and between five to ten times faster  when  accessing data on disk.

“NoSQL databases are popular because they allow developers to incorporate data of any structure and don’t bind them to a particular data model. But NoSQL databases, in general, are less mature when it comes to analytic capabilities,” explains Wikibon principal research contributor Jeff Kelly. “Adding Spark to the DataStax’s enterprise platform should provide significantly better analytic performance against operational data.”

In addition to providing considerably more horsepower than alternatives, Spark is fully interoperable with the core components of Hadoop, which means users don’t have to make any major modifications to their existing environments before implementing it. Spark also includes a query language for the Hive data warehouse called Shark, with is also expected to be supported by the DataStax-Databricks alliance..

DataStax is just the latest in a fast-growing list of vendors to throw its weight behind Spark. Last week, Hortonworks announced that it has added support for the framework through its own partnership with Databricks, thereby becoming the third major Hadoop distributor to endorse it.

The big picture

 

Spark support is not the only thing that DataStax Enterprise 4.5 has got going for it. The release also sports integration with the Hadoop distributions of Hortonworks and Cloudera to open up use cases that involve mixing historical insights with real-time data. It also adds a Performance  Service that DataStax says provides detailed diagnostics information all the way down to the efficiency of individual database statements.

That increased visibility is complemented by a revamped version of the built-in OpsCenter dashboard that allows admins to configure custom monitoring metrics. The focus of the upgrade is on visualizing complex tasks: it includes a graphical best practice enforcement mechanism, a much-needed set of security features and a point-­and-­click remote  management tool that enables access from multiple devices. Additionally, DataStax says that a single installation of OpsCenter can now support over 1,000 nodes, an increase meant to reduce licensing costs and complexity in the kind of large-scale deployments operated by its biggest customers.

photo credit: Eiimeon (wandering off in London) via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU