UPDATED 02:31 EDT / OCTOBER 28 2016

NEWS

Databricks brings deep learning capabilities to Apache Spark

Databricks Inc. is turning its attention to machine learning by adding support for deep learning on its cloud-based Apache Spark platform.

The gist of the announcement is that Databricks is adding graphics chip support and integrating popular deep learning libraries to accelerate these kinds of workloads on the Spark platform.

By adding support for deep learning and graphics processing unit chips and integrating pre-installed libraries and examples, Databricks users will be able to leverage the power of GPUs to perform an array of machine learning tasks on Spark, including image processing and text analysis. The company says users can expect to see a 10 times speed boost in deep learning and automated configuration of GPU machines, as well as smoother integration with Spark clusters.

Today’s announcement follows Databricks’ earlier release of TensorFrames, a software library that enables the popular deep learning framework TensorFlow to run on Spark.

“The enhancements announced today simplify deep learning on Spark by adding out-of-the-box support for using TensorFrames with GPUs — specialized hardware that can perform an impressive amount of deep learning-specific computations in parallel,” the company said in a statement. “With Databricks, data teams can easily conduct deep learning on highly optimized hardware with a few clicks or API calls.”

The addition of GPU support is important, because while deep learning is an extremely powerful way to model data, it comes at the cost of extremely expensive computations when performed on normal architectures. However, GPUs can dramatically reduce this cost thanks to their ability to support efficient parallel computation.

In a blog post, Databricks highlights the cost benefits of using GPUs in a benchmark test of a simple numerical task:

“We compared optimized code written in Scala and run on top-of-the-line compute intensive machines in AWS (c3.8xlarge) against standard GPU hardware (g2.2xlarge),” wrote Databricks engineers Tim Hunter, Joseph Bradley and Yandong Mao in a blog post. “Using TensorFlow as the underlying compute library, the code is 3X shorter, and about 4X less expensive (in $ cost) to run on a GPU cluster.”

cpu-and-gpu-deep-learning-comparison

Databricks says support for deep learning will allow organizations to perform data wrangling, interactive exploration, stream data processing and other advanced analytics techniques alongside deep learning in a single platform. With these capabilities, organizations can avoid unwanted system complexities and simplify the development of deep learning applications such as accurate cancer detection for healthcare providers, faster drug discover for pharmaceutical companies and more capable AI for use in things like language translation.

Image credit: Unsplash via pixabay.com

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU