Visual Layer raises $7M to clean up image datasets for AI training
Data quality startup Visual Layer Inc. announced today it has closed on a $7 million seed funding round led by Madrona Venture Group and Insight Partners and will use the money to build out its managed service for curating large-scale image sets used to train computer vision models.
Researchers are all too familiar with the challenge of gathering images to be used in training datasets. The quality of artificial intelligence models is directly correlated to the quality of the data they are trained on. The best computer vision models are trained on datasets containing billions of images, but that doesn’t necessarily mean those models are superior.
According to Visual Layer, up to 30% of the images and videos used in training datasets can be described as “messy.” The result of those messy images is skewed AI models getting leveraged in real products and services, leading to problems around AI bias and missed business opportunities.
Visual Layer uses the term “messy” to describe images and videos that are incorrectly labeled, broken, missing or duplicates, and it says they all contribute to reducing the quality of the AI models trained upon them. With some AI builders now using datasets that contain in excess of 10 billion visual assets, it has become impossible for humans to perform quality control manually.
That’s where Visual Layer comes in. It has created a service, based on an open-source project called Fastdup, that helps data scientists clean their datasets prior to model training. It applies quality automation to correct image labels, remove duplicates, identify anomalies and more.
Whenever it finds an image label that’s wrong or confusing, it will either correct it or drop the image altogether. By doing this, Visual Layer effectively cleans the dataset, helping to improve the overall accuracy of the model that will be trained on it.
Visual Layer co-founder and Chief Executive Danny Bickson said visual data can be one of the most complex and challenging types of data to manage. But understanding, curating and managing this content is crucial to building meaningful services around AI. “Companies are struggling with those huge amounts of data; they often have no clue where their data is and what is inside it,” Bickson said. “They develop homegrown tools since there is no infrastructure or common standards.”
Visual Layer is emerging from stealth mode now, but the Fastdup open-source package has already amassed a community of more than 200,000 early adopters, including the Indian social commerce platform Meesho Inc., which hosts more than 13 million resellers. “Meesho is using Fastdup to improve the quality of our image gallery of 200 million products and automatically detect and fix data quality issues,” said Srinvassa Rao Jami, lead computer vision manager at Meesho.
In a blog post, Jon Turow, a partner at Madrona Ventures, said he sees Visual Layer emerging as part of a broader trend by AI builders to demand higher-quality data, not just increased quantity.
Turow spoke with John Furrier, host of SiliconANGLE Media’s video studio theCUBE, earlier this year about the abundant opportunities for AI-driven services:
Image: Visual Layer
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU