UPDATED 09:00 EST / JULY 12 2023

AI

Machine learning startup Deci open-sources tool to analyze AI training dataset health

Deep learning automation startup Deci AI Ltd. today announced the launch of a free and open-source artificial intelligence tool that can profile datasets for model training purposes.

The company said in its announcement that DataGradients enables data scientists quickly to generate insights on datasets they’re planning to use to train new AI models, in order to understand how capable that model will likely be.

Deci is the creator of a machine learning development platform that’s used to build, optimize and deploy AI models in the cloud, at the edge or on mobile devices. One of the challenges Deci aims to solve is the “AI efficiency gap.” That’s a common problem for AI developers where the hardware they’re using is unable to meet the demands of their models.

The company aims to solve this problem with its Automated Neural Architecture Construction tool, which helps optimize machine learning models for the target hardware. Developers simply define the task they wish their AI model to solve, provide the training dataset and then specify the hardware, and Deci will optimize the model for the specific task and hardware.

With DataGradients, users will now be able to better understand how well their models will perform even before they create them. The startup explains that it’s especially useful in computer vision, where the capabilities of models are directly related to the quality of the data used to train them.

For AI developers, it’s paramount that they can identify issues and weaknesses with their datasets in order to avoid training roadblocks and ensure it can carry out its intended tasks sufficiently. By having a good understanding of the underlying dataset, developers can make smarter decisions on the appropriate model choice, best loss function and optimization methods, Deci said.

More specifically, DataGradients enables data scientists to analyze and establish the health of datasets, identifying problems such as corrupted data, distributional shifts between training and test datasets, duplicate annotations and others. Users also get insights that can help them mitigate these issues and improve the quality of the dataset to ensure their models will perform better.

Constellation Research Inc.’s vice president and principal analyst Andy Thurai told SiliconANGLE that deep learning models, including computer vision models, can be very hard to train, because achieving the desired accuracy is highly dependent on having high quality training datasets. “When you train computer vision models on datasets of subpar quality, the results are often very unpredictable,” he explained.

Luckily, data scientists have a number of data quality tools at their disposal to help them determine if their datasets are fit for purpose, Thurai said. “There are many tools available commercially, but the open-source nature of DataGradients may help it to gain more traction among the developer community,” he added.

Deci co-founder and Chief Executive Yonatan Geifman said DataGradients is all about streamlining the model development and training process with “crystal-clear visibility” into the underlying datasets used. He noted that it’s the third open-source tool the company has released, following the launch of its PyTorch training library SuperGradients and its object detection foundation model YOLO-NAS.

Image: svstudioart/Freepik

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU