UPDATED 12:00 EDT / OCTOBER 03 2018

BIG DATA

Databricks’ MLflow machine learning toolkit adds support for R language

Big-data company Databricks Inc. today updated its MLflow machine learning toolkit with support for the R programming language and other new features aimed at boosting its utility.

Databricks had unveiled MLflow only in June in order to help to standardize the process of developing and moving machine learning applications into production. The company argues that the process of training machine learning algorithms is largely inconsistent, and that there are few tools available to reproduce results, track experiments and manage models.

MLflow is designed to help companies better package their machine learning code, execute it, test it and deploy it in production. It allows developers full control to manage the machine learning training lifecycle end-to-end by standardizing on existing ML toolkits and frameworks across common deployment methods.

“MLflow is a unified toolkit for developing machine learning applications in a repeatable manner while having the flexibility to deploy reliably in production across multiple cloud environments,” Databricks Chief Technologist Matei Zaharia said when MLflow was launched.

With today’s update, Databricks has teamed up with RStudio Inc., which provides an open source and integrated development environment for R, to help integrate the programming language. Now, MLflow is available to the large community of data scientists that use RStudio and R to build new applications.

“Integration of R with MLflow will significantly broaden the reach of the project by allowing a broader community to use and contribute to MLflow,” JJ Allaire, chief executive officer at RStudio, said in a statement.

In addition to R, MLflow also gains support for programming languages including Python, Java and Scale, and also a REST server interface that allows it to be used with other languages.

MLflow also gains integrations with popular machine learning libraries and frameworks such as SciKit-Learn, TensorFlow, Keras, PyTorch, H2O and Apache Spark Mllib, Databricks said.

Finally, Databricks is adding cross-cloud support for the MLflow toolkit, which means models built with it can be deployed on cloud services such as Microsoft Corp.’s Azure ML platform, Amazon Web Services Inc.’s SageMaker and Databricks’ own Unified Data Analytics platform.

“MLflow leverages AWS S3, Google Cloud Storage and Azure Blob Storage allowing teams to easily track and share artifacts from their code,” company officials said.

“With MLflow, data science teams can systematically package and reuse models across frameworks, track and share experiments locally or in the cloud, and deploy models virtually anywhere,” Zaharia added in a new statement. “The flurry of interest and contributions we’ve seen from the data science community validates the need for an open-source framework to streamline the machine learning lifecycle.”

Image: Databricks

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU