UPDATED 12:00 EDT / JULY 12 2022

BIG DATA

Soda Data debuts open-source framework for embedding data reliability checks-as-code

Belgium-based data observability firm Soda Data NV today announced the launch of Soda Core, an open-source framework that it says can be used to embed reliability checks and quality management into data pipelines.

The Soda Core framework is powered by Soda Checks Language, or SodaCL, a domain-specific programming language for data reliability that was also launched today. The company said the new offerings combine to enable “data engineering-as-code,” a simpler way to ensure the reliability of critical data streams that power enterprise applications and services.

Soda said the systems and pipelines that deliver data to businesses require constant attention to address changes to data schemas and structures, broken transformation logic and concept drift. These can all impact the reliability, quality and therefore trust in data. However, Soda says fixing these problems at large scale is a real challenge because of a lack of suitable tools, processes and expertise.

Soda Core is a free-to-use and open-source framework for data engineers that makes it easy to build and maintain data checks-as-code that can operate at scale for every kind of data workload, be it data ingestion, transformation or consumption. With the framework, data engineers can access a library of tools to help ensure data reliability. These include tools that can analyze dataset metadata to understand the shape and health of that information, plus metrics and broad check coverage tools that can help to validate data quality parameters.

Data engineers can use Soda Core to create fixed and dynamic thresholds for testing and validating data as part of a workflow that’s designed to detect and resolve any issues. It can also provide alerts to notify the right people at the right time in the event of any issues, the company said.

SodaCL plays an important role because it’s designed as a simplified language that can be written and read by almost anyone. It replaces the need to code in Structured Query Language, meaning that nontechnical users can define the thresholds of what “good data” needs to look like, the company said.

Soda co-founder and Chief Technology Officer Tom Baeyens said the needs of data engineers are quite different from the needs of the rest of the data team.

“A lot of people in a data team know what good data looks like but only a few can code the checks,” Baeyens explained. “With our releases today, we are providing the tools to remove the bottlenecks that exist around coding data reliability, enabling data engineers to build data quality checks-as-code directly into their pipelines and fundamentally change how teams set up and maintain reliable, high-quality data products.”

“The rapidly maturing Observability market is now branching out into adjacent tools markets, as we can see with Soda,” said Holger Mueller of Constellation Research Inc. “With Soda Core and SodaCL, it is helping to manage the critical data needed by enterprises. It’s interesting to see Soda’s effort around another domain-specific language with SodaCL. Only the future will tell if this will lead to the uptake by the community of yet another DSL.”

The company said SodaCL is a work in progress, serving as a language foundation that will evolve over time to address specific issues across domains such as asset management, supply chain and customer data. The first iteration of SodaCL provides data test and monitoring checks-as-code from ingestion through transformation, with more than 30 built-in metrics and check types available at launch.

Images: Soda Data

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+  
CUBE Alumni Network
C-level and Technical
Domain Experts
15M+ 
theCUBE
Viewers
Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.