Sparkling Water 2.0 enables machine learning with Apache Spark
Enterprises have a tough time gathering insights from the vast oceans of data they accumulate, but a new tool for Apache Spark is hoping to change that, by allowing them to merge machine learning algorithms with the popular data processing engine.
Announced last week, Sparkling Water 2.0 is a newly updated tool created by a startup called H20.ai, formerly known as Oxdata Inc, which offers an open-source algorithm development platform of the same name. The tool is designed to make it simpler for companies to use machine learning algorithms in their data analysis. As such, Sparkling Water 2.0 is kind of like an API that lets users tap into H20’s open-source AI platform, instead of using Spark’s own MLlib machine-learning library.
In a statement, the company explained that Sparkling Water was designed to let users enjoy the best features of Spark alongside its own speed, columnar-compression and fully-featured machine learning algorithms. The tools also provides more flexibility for companies looking to find the best algorithms for specific use cases, simply by bringing more options to the table.
“Apache Spark’s MLlib offers a library of efficient implementations of popular algorithms directly built using Spark,” the company noted. But with Sparking Water, companies can also “use H2O algorithms in conjunction with, or instead of, MLlib algorithms on Apache Spark.”
As such, the tool is likely to appeal to both Spark and H20’s users, explained one analyst.
“Enterprises are looking to take advantage of a variety of machine learning algorithms to address an increasingly complex set of use cases when determining how to best serve their customers,” said Matt Aslett, Research Director, Data Platforms and Analytics at 451 Research. “Sparkling Water is likely to be attractive to H2O and Spark users alike, enabling them to mix and match algorithms as required.”
Sparking Water 2.0’s headline feature is it allows users to run both Spark and Scala through H20’s Flow user interface. In addition, it also brings a new visualization component to Spark’s MLlib, allowing users to see the results of their machine-learning algorithm powered analysis in a format that’s easier to digest.
The software supports the Apache Zeppelin notebook as well as Spark 2.0 and earlier editions, and offers production support for machine-learning pipelines.
H20.ai is also working on a project known as “Steam”, which it describes as a data science hub that allows data scientists and developers to collaboratively build, deploy and refine predictive applications across large scale data sets, eliminating much of the heavy lifting involved in DevOps. With Steam, developers and data scientists will be able to compare models across teams and move them into production without needing to perform any of the engineering work needed on the back end.
Image credit: ColiNOOB via Pixabay.com
Since you’re here …
Show your support for our mission with our one-click subscription to our YouTube channel (below). The more subscribers we have, the more YouTube will suggest relevant enterprise and emerging technology content to you. Thanks!
Support our mission: >>>>>> SUBSCRIBE NOW >>>>>> to our YouTube channel.
… We’d also like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.