SodaGPT uses generative AI to ‘shift left’ on data quality testing
Data quality testing platform Soda Data NV today announced the launch of SodaGPT, a data management platform that uses generative artificial intelligence to help users define data quality expectations.
SodaGPT uses generative AI’s natural language processing capabilities and SodaCL’s domain-specific language capabilities to translate natural language queries into data quality tests.
“SodaGPT is the first generative AI-powered tool for data quality, enabling users of all backgrounds, technical or not, to take a no-code approach to naturally express and define data quality expectations,” Soda Chief Executive Maarten Masschelein (pictured, left, with co-founder Tom Baeyens) said in an interview with SiliconANGLE.
At a glance, the announcement highlights how generative AI, the technology behind the enormously popular ChatGPT from Open AI LP and Google LLC’s Bard, can be used within data management to empower consumers and nontechnical users to improve data quality and reliability while reducing the administrative burdens placed on data engineers.
Fixing the data quality crisis with generative AI
The announcement comes as data quality appears to be declining. One survey found that the average number of monthly data incidents increasing from 59 per organization in 2022 to 67 in 2023.
Soda, which has raised more than $14 million in funding to date, plans to mitigate data incidents by using generative AI to shift data quality management left to nontechnical staff and insight consumers so they can help identify reliability issues earlier in the development lifecycle.
“The idea is based around a self-service contribution model, which enables data consumers to enter natural language code contributions via SodaGPT, which will then be translated into SodeCL, so that these users can define expectations for data quality independently, while engineers can provide oversight to ensure checks are correctly defined before they’re shipped into the data pipeline,” Masschelein said.
This approach not only makes data management more accessible to users without coding knowledge but also reduces the workload of data engineers who have traditionally had to manage the data themselves.
The data quality market
SodaGPT is now entering the data quality tools market, which Research and Markets estimates will reach $5.4 billion by 2030, following an almost 19% compound annual growth rate from 2022.
The organization is competing against a number of key competitors in the market, including Validity Inc.’s DemandTools, a tool used by organizations including Salesforce Inc. and Argyle Systems Inc. for preparing and cleaning CRM data.
Another key competitor in the space is Monte Carlo, which has generated total funding of $101 million so far, and provides a platform for bulk deduplication of data, duplication prevention, and record ownership management.
For Soda, the use of generative AI within SodaGPT will act as a key differentiator to democratize data quality management so that even nontechnical users can improve the reliability of their data.
“With this new tool, Soda is ripping up the antiquated approach to data quality checks built exclusively for a technology audience that can read and write in SQL, instead simplifying the process for data consumers and creating a truly no-code experience to data quality,” Masschelein said.
Photo: Soda Data
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU