UPDATED 20:28 EDT / APRIL 04 2019

AI

LinkedIn’s newly open-sourced Avro2TF preps data for TensorFlow

LinkedIn Corp. Thursday donated yet another internally built tool to the open-source community: a conversion tool that transforms data from Apache Spark into a format that can easily be consumed by TensorFlow for machine learning purposes.

TensorFlow is one of the most popular and widely used frameworks for running machine learning, deep learning and other statistical and predictive analytics workloads. Apache Spark is an open source big-data processing engine that’s designed to execute streaming, machine learning or SQL workloads that require fast and constant access to datasets.

LinkedIn’s new tool, called Avro2TF, enables data scientists and other users to convert datasets stored in the Apache Avro format commonly used by LinkedIn’s engineers into a pattern that can be easily consumed by TensorFlow. The benefit is a simple but useful one: It frees up engineers and developers to focus on their machine learning models.

Avro2TF is just the latest in a series of machine learning-based tools LinkedIn has donated to the open-source community, in line with its stated mission to “democratize machine learning.”

“One of the important lessons we have learned from this journey is the importance of providing good deep learning platforms that help our modeling engineers become more efficient and productive,” LinkedIn engineers Xuhong Zhang, Chenya Zhang and Yiming Ma wrote in a blog post. “Avro2TF is part of this effort to reduce the complexity of data processing and improve the velocity of advanced modeling.”

LinkedIn’s engineers explained that they built Avro2TF to address their need for a solution focused on “scalable data conversion.” The tool is said to support all kinds of Spark-readable data formats, including optimized row columnar, sparse vector and dense vector data.

Here’s where Avro2TF fits into the TensorFlow stack:

avro1

LinkedIn said it believes that many organizations will be able to benefit from Avro2TF because the Microsoft Corp. company isn’t the only one that has been grappling with the challenge of converting data for machine learning purposes.

“We believe that this is not only a LinkedIn problem, many companies have vast amount of ML data in similar sparse vector format, and Tensor format is still relatively new to many companies,” the engineers said. “Avro2TF bridges this gap by providing scalable Spark based transformation and extensions mechanism to efficiently convert the data into TF records that can be readily consumed by TensorFlow.”

Analyst Holger Mueller of Constellation Research Inc. told SiliconANGLE there should be many organizations that are eager to use Avro2TF, since it provides a vital link between two popular open-source technologies.

“These ‘bridge’ open-source projects are vital for enterprises to build next-generation apps because they don’t have the resources that LinkedIn has to build them,” Mueller said.

LinkedIn said Avro2TF is available to download on GitHub along with a tutorial on how to get it up and running.

Photo: Howard Narvaez/Flickr

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.