UPDATED 12:00 EST / JUNE 05 2018

BIG DATA

Databricks platform updates speed up AI and machine learning workloads

Big-data analytics company Databricks Inc. is sharpening its focus on artificial intelligence workloads, announcing today a major update to its Unified Analytics Platform that should help to unlock the siloed data needed to power these workloads and simplify machine learning processes.

Databricks said it has identified some of the major problems that prevent organizations from successfully innovating with AI technologies.

”We’re at the cusp of the machine learning revolution, and we’re seeing a lot more companies getting down to doing project,” Databricks co-founder and Chief Executive Ali Ghodsi (pictured) said in an interview with SiliconANGLE. As a result, he said, they need tools to making it much easier to do machine learning at large scale, without having to hire very scarce and expensive data scientists.

Ghodsi said the problem is that enterprises are forced to use numerous disconnected tools to accomplish that. But the disparate tools create both organizational and technology silos resulting in friction that slows down the progress of AI projects.

“Organizations are being told to leverage AI, machine learning, and deep learning, but the current level of complexity in the space has never been higher,” said Mike Leone, senior analyst at Enterprise Strategy Group Inc. “Instead of churning out more one-off toolkits and frameworks to satisfy every specific use case, there is a growing need to simplify what is already available. This is especially important for those organizations just getting started or early in the adoption process.”

One of the main problems with training machine learning algorithms is that the development process remains rather ad hoc, with few tools available to reproduce results, track experiments and manage models, the company said. To fix this, Databricks is introducing a new machine learning toolkit called “MLflow,” which is designed to help companies better package machine learning code, execute it and test it, and finally deploy it into production.

“There is no toolkit for machine learning, which is forcing organizations to piece together point solutions and secure highly specialized skills to achieve AI,” said Databricks Chief Technologist Matei Zaharia. “MLflow is a unified toolkit for developing machine learning applications in a repeatable manner while having the flexibility to deploy reliably in production across multiple cloud environments.”

Ghodsi said the toolkit, which Zaharia has been developing for the past year, has already garnered interest from large enterprises. “We’re hoping MLflow will become the standard lingua franca” for machine learning.

“With MLflow, organizations will be able to manage the ML lifecycle from end-to-end, including model building and deployment into production by, for lack of a better word, standardizing on existing toolkits/frameworks across preferred deployment options,” Leone said. “You’ll be able to jump right into the ML testing phase as opposed to dealing with interoperability issues.”

A second problem Databricks has identified has to do with deep learning, a subset of AI that’s used for training applications such as natural language processing, image classification and object detection. But the only way to improve these models to a point where they can be useful is by feeding them with ever-increasing amounts of data, and this takes a considerable amount of time. Databricks said enterprises have resorted to using a variety of deep learning frameworks, including Tensorflow, Keras and Horovod, to help speed things up, only to find themselves lumbered with more complexity than they can handle.

To help organizations get a better handle on this, Databricks has come up with a new feature called “Runtime for ML” that provides preconfigured environments for deep learning that are integrated with these popular frameworks. The company is also adding support for graphics processing unit chips on the Amazon Web Services and Microsoft Azure clouds, enabling data scientists to train, evaluate and deploy their deep learning models on a single, unified engine.

“Runtime for ML will help with tracking and reproducing experiments making model building faster,” Leone said. “For deployment, MLflow will give organizations the ability to easily deploy ML models the way they want, whether on-prem or across clouds, as well as providing the integration and monitoring once deployed. This directly addresses the difficulties organizations face when not only moving ML models into production, but while maintaining them.”

Ghodsi also highlighted the capabilities of Databricks’ Delta data warehouse, which it said can be used clean and prepare data so it’s ready to be used to train AI models. Apple Inc., for instance, has used it for all its internal threat and anomaly detection, collecting almost a petabyte of new data per week. Ghodsi said he views Delta, which will be generally available at the end of this month, as the biggest technological improvement toward faster machine learning and deep learning in more than six years.

Analyst James Kobielus of Wikibon, owned by the same company as SiliconANGLE, said the updates mean that Databricks is now one of the very few companies that can help developers take machine learning, deep learning and other artificial intelligence projects all the way from preparation through modeling and training to operationalization across complex hybrid clouds. He also praised the new capabilities for the agility they provide developers.

“Developers can use their choice of leading modeling frameworks within the Databricks environment, allowing them to efficiently scale data engineering with transactional integrity,” Kobielus said. “They can execute and compare hundreds of parallel AI modeling/training ‘experiments’ in parallel, and leverage any hardware or software platform in this pipeline. And they can deploy the trained models to diverse production server platforms and clouds. With these announcements, Databricks has addressed the need for today’s developers for a simple, robust and industrial-grade pipeline to power the most demanding AI projects.”

The new capabilities are available in the latest version of Databricks’ Unified Analytics Platform.

Databricks, which raised a massive $140 million late-stage round of funding in August, has been busy lately. In March, it made its flagship analytics platform available as an integrated service within Azure called Microsoft Azure Databricks to help customers better process massive amounts of data stored there.

With reporting from Robert Hof

Image: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU