Sparkling Water 2.0 enables machine learning with Apache Spark
Enterprises have a tough time gathering insights from the vast oceans of data they accumulate, but a new tool for Apache Spark is hoping to change that, by allowing them to merge machine learning algorithms with the popular data processing engine.
Announced last week, Sparkling Water 2.0 is a newly updated tool created by a startup called H20.ai, formerly known as Oxdata Inc, which offers an open-source algorithm development platform of the same name. The tool is designed to make it simpler for companies to use machine learning algorithms in their data analysis. As such, Sparkling Water 2.0 is kind of like an API that lets users tap into H20’s open-source AI platform, instead of using Spark’s own MLlib machine-learning library.
In a statement, the company explained that Sparkling Water was designed to let users enjoy the best features of Spark alongside its own speed, columnar-compression and fully-featured machine learning algorithms. The tools also provides more flexibility for companies looking to find the best algorithms for specific use cases, simply by bringing more options to the table.
“Apache Spark’s MLlib offers a library of efficient implementations of popular algorithms directly built using Spark,” the company noted. But with Sparking Water, companies can also “use H2O algorithms in conjunction with, or instead of, MLlib algorithms on Apache Spark.”
As such, the tool is likely to appeal to both Spark and H20’s users, explained one analyst.
“Enterprises are looking to take advantage of a variety of machine learning algorithms to address an increasingly complex set of use cases when determining how to best serve their customers,” said Matt Aslett, Research Director, Data Platforms and Analytics at 451 Research. “Sparkling Water is likely to be attractive to H2O and Spark users alike, enabling them to mix and match algorithms as required.”
Sparking Water 2.0’s headline feature is it allows users to run both Spark and Scala through H20’s Flow user interface. In addition, it also brings a new visualization component to Spark’s MLlib, allowing users to see the results of their machine-learning algorithm powered analysis in a format that’s easier to digest.
The software supports the Apache Zeppelin notebook as well as Spark 2.0 and earlier editions, and offers production support for machine-learning pipelines.
H20.ai is also working on a project known as “Steam”, which it describes as a data science hub that allows data scientists and developers to collaboratively build, deploy and refine predictive applications across large scale data sets, eliminating much of the heavy lifting involved in DevOps. With Steam, developers and data scientists will be able to compare models across teams and move them into production without needing to perform any of the engineering work needed on the back end.
Image credit: ColiNOOB via Pixabay.com
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU