DataStax becomes latest vendor to jump on the Apache Spark bandwagon
The limitation of historical data analysis as a forecasting tool is spawning a rush by analytics providers to add real-time processing functionality to their offerings. DataStax, a company that sells an analytics suite based upon the Apache Cassandra NoSQL engine, is the latest to join the back.
Cassandra is an open-source database that’s used to store vast amounts of unstructured information across commodity servers and that is especially useful for processing high-volume, fast-moving workloads such as sensor data. The Santa Clara-based DataStax’s premium distribution provides several value-added features on top of Cassandra, including backup and recovery, monitoring and in-memory processing, which is a capability that was introduced with version 4.0 of the platform hot on the heels of a similar update from Oracle. Both vendors are touting up to 100 times faster performance compared to traditional analytics engines.
Now DataStax is upping the ante with release 4.5 of its namesake offering, which includes support for Apache Spark, an engine described by SiliconANGLE founder John Furrier as the “next big thing in Big Data.” The integration is the result of a May partnership with Databricks, a startup founded by the original creators of the technology that offers professional support and certification.
Analytics at the speed of lightning
Spark is an analytics engine that is said to be up to 100 times faster than MapReduce – the most widely used data crunching engine for Hadoop – when running in memory, and between five to ten times faster when accessing data on disk.
“NoSQL databases are popular because they allow developers to incorporate data of any structure and don’t bind them to a particular data model. But NoSQL databases, in general, are less mature when it comes to analytic capabilities,” explains Wikibon principal research contributor Jeff Kelly. “Adding Spark to the DataStax’s enterprise platform should provide significantly better analytic performance against operational data.”
In addition to providing considerably more horsepower than alternatives, Spark is fully interoperable with the core components of Hadoop, which means users don’t have to make any major modifications to their existing environments before implementing it. Spark also includes a query language for the Hive data warehouse called Shark, with is also expected to be supported by the DataStax-Databricks alliance..
DataStax is just the latest in a fast-growing list of vendors to throw its weight behind Spark. Last week, Hortonworks announced that it has added support for the framework through its own partnership with Databricks, thereby becoming the third major Hadoop distributor to endorse it.
The big picture
Spark support is not the only thing that DataStax Enterprise 4.5 has got going for it. The release also sports integration with the Hadoop distributions of Hortonworks and Cloudera to open up use cases that involve mixing historical insights with real-time data. It also adds a Performance Service that DataStax says provides detailed diagnostics information all the way down to the efficiency of individual database statements.
That increased visibility is complemented by a revamped version of the built-in OpsCenter dashboard that allows admins to configure custom monitoring metrics. The focus of the upgrade is on visualizing complex tasks: it includes a graphical best practice enforcement mechanism, a much-needed set of security features and a point-and-click remote management tool that enables access from multiple devices. Additionally, DataStax says that a single installation of OpsCenter can now support over 1,000 nodes, an increase meant to reduce licensing costs and complexity in the kind of large-scale deployments operated by its biggest customers.
photo credit: Eiimeon (wandering off in London) via photopin cc
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU