UPDATED 20:28 EDT / APRIL 04 2019

AI

LinkedIn’s newly open-sourced Avro2TF preps data for TensorFlow

LinkedIn Corp. Thursday donated yet another internally built tool to the open-source community: a conversion tool that transforms data from Apache Spark into a format that can easily be consumed by TensorFlow for machine learning purposes.

TensorFlow is one of the most popular and widely used frameworks for running machine learning, deep learning and other statistical and predictive analytics workloads. Apache Spark is an open source big-data processing engine that’s designed to execute streaming, machine learning or SQL workloads that require fast and constant access to datasets.

LinkedIn’s new tool, called Avro2TF, enables data scientists and other users to convert datasets stored in the Apache Avro format commonly used by LinkedIn’s engineers into a pattern that can be easily consumed by TensorFlow. The benefit is a simple but useful one: It frees up engineers and developers to focus on their machine learning models.

Avro2TF is just the latest in a series of machine learning-based tools LinkedIn has donated to the open-source community, in line with its stated mission to “democratize machine learning.”

“One of the important lessons we have learned from this journey is the importance of providing good deep learning platforms that help our modeling engineers become more efficient and productive,” LinkedIn engineers Xuhong Zhang, Chenya Zhang and Yiming Ma wrote in a blog post. “Avro2TF is part of this effort to reduce the complexity of data processing and improve the velocity of advanced modeling.”

LinkedIn’s engineers explained that they built Avro2TF to address their need for a solution focused on “scalable data conversion.” The tool is said to support all kinds of Spark-readable data formats, including optimized row columnar, sparse vector and dense vector data.

Here’s where Avro2TF fits into the TensorFlow stack:

avro1

LinkedIn said it believes that many organizations will be able to benefit from Avro2TF because the Microsoft Corp. company isn’t the only one that has been grappling with the challenge of converting data for machine learning purposes.

“We believe that this is not only a LinkedIn problem, many companies have vast amount of ML data in similar sparse vector format, and Tensor format is still relatively new to many companies,” the engineers said. “Avro2TF bridges this gap by providing scalable Spark based transformation and extensions mechanism to efficiently convert the data into TF records that can be readily consumed by TensorFlow.”

Analyst Holger Mueller of Constellation Research Inc. told SiliconANGLE there should be many organizations that are eager to use Avro2TF, since it provides a vital link between two popular open-source technologies.

“These ‘bridge’ open-source projects are vital for enterprises to build next-generation apps because they don’t have the resources that LinkedIn has to build them,” Mueller said.

LinkedIn said Avro2TF is available to download on GitHub along with a tutorial on how to get it up and running.

Photo: Howard Narvaez/Flickr

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU