UPDATED 12:00 EST / JULY 12 2022

BIG DATA

Soda Data debuts open-source framework for embedding data reliability checks-as-code

Belgium-based data observability firm Soda Data NV today announced the launch of Soda Core, an open-source framework that it says can be used to embed reliability checks and quality management into data pipelines.

The Soda Core framework is powered by Soda Checks Language, or SodaCL, a domain-specific programming language for data reliability that was also launched today. The company said the new offerings combine to enable “data engineering-as-code,” a simpler way to ensure the reliability of critical data streams that power enterprise applications and services.

Soda said the systems and pipelines that deliver data to businesses require constant attention to address changes to data schemas and structures, broken transformation logic and concept drift. These can all impact the reliability, quality and therefore trust in data. However, Soda says fixing these problems at large scale is a real challenge because of a lack of suitable tools, processes and expertise.

Soda Core is a free-to-use and open-source framework for data engineers that makes it easy to build and maintain data checks-as-code that can operate at scale for every kind of data workload, be it data ingestion, transformation or consumption. With the framework, data engineers can access a library of tools to help ensure data reliability. These include tools that can analyze dataset metadata to understand the shape and health of that information, plus metrics and broad check coverage tools that can help to validate data quality parameters.

Data engineers can use Soda Core to create fixed and dynamic thresholds for testing and validating data as part of a workflow that’s designed to detect and resolve any issues. It can also provide alerts to notify the right people at the right time in the event of any issues, the company said.

SodaCL plays an important role because it’s designed as a simplified language that can be written and read by almost anyone. It replaces the need to code in Structured Query Language, meaning that nontechnical users can define the thresholds of what “good data” needs to look like, the company said.

Soda co-founder and Chief Technology Officer Tom Baeyens said the needs of data engineers are quite different from the needs of the rest of the data team.

“A lot of people in a data team know what good data looks like but only a few can code the checks,” Baeyens explained. “With our releases today, we are providing the tools to remove the bottlenecks that exist around coding data reliability, enabling data engineers to build data quality checks-as-code directly into their pipelines and fundamentally change how teams set up and maintain reliable, high-quality data products.”

“The rapidly maturing Observability market is now branching out into adjacent tools markets, as we can see with Soda,” said Holger Mueller of Constellation Research Inc. “With Soda Core and SodaCL, it is helping to manage the critical data needed by enterprises. It’s interesting to see Soda’s effort around another domain-specific language with SodaCL. Only the future will tell if this will lead to the uptake by the community of yet another DSL.”

The company said SodaCL is a work in progress, serving as a language foundation that will evolve over time to address specific issues across domains such as asset management, supply chain and customer data. The first iteration of SodaCL provides data test and monitoring checks-as-code from ingestion through transformation, with more than 30 built-in metrics and check types available at launch.

Images: Soda Data

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU