UPDATED 15:00 EDT / APRIL 05 2022

BIG DATA

Dremio reveals how metadata operations are helping orgs reach a true data-as-code paradigm

Developers are a crucial element in the production and maintenance of digital products. Whenever they can easily access, manipulate and get creative with the resources they need, that’s when innovation happens.

Data as code is a paradigm that seeks to expand the frontier for developers, as well as the technology industry at large. The enterprise has reached a stage in DevOps where developers are lifting data sets out of production, working them into code, and subsequently testing them for effectiveness and compatibility. Basically, data itself is being programmed, and this represents the ideal concept of data as code, according to Mark Lyons (pictured), vice president of product management at Dremio Corp.

“You have to do this all through metadata operations so you can control what version of the data that the individual’s working with and which version of the data the production systems are seeing, because these data sets are too big,” he stated.

Lyons spoke with theCUBE industry analyst John Furrier during the AWS Startup Showcase: “Data as Code — The Future of Enterprise Data and Analytics” event, an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed how metadata operations and platforms like Dremio are helping companies manage huge datasets. (* Disclosure below.)

Expanding and expediting the use of data

With the data playing field more expansive and easier to access, data engineers, scientists, analysts and even end-consumers have a lot to gain. In addition to increased productivity, there’s now room for more groundbreaking experimentation and the proliferation of use cases.

With the elimination of the need to build entirely new data pipelines, define new schemas, add columns/data types, etc., engineers, developers and organizations looking to be data-driven can do a lot more with data at a much faster pace, according to Lyons. De-risking operations is another benefit to manipulating data as code, he added.

“You’re not worried about messing up the production system, messing up that data, having it seen by the end user. With some businesses, data is their business … going all the way to a consumer, a third party,” Lyons stated.

As time has moved on and with computing becoming more intricate, many mundane tasks — like iterating artificial intelligence and machine learning algorithms — have become downright painstaking.

“I think it’s going to change the world, because this stuff was so painful to do. The data sets had gotten so much bigger as you know, but we were still doing it in the old way, which was typically moving data around for everyone,” Lyons explained. “It was copying data down, sampling data and moving data.”

The old paradigm is proving more and more ineffective by the day, and data as code is set to remediate that.

Data lakes are complementing, not opposing, the shift

When cloud data lakes came around, distributed file systems like Hadoop and Snowflake were the mainstay. A few industry analysts even predicted that these data lakes might not reach their current level of popularity. The technology showed itself valuable, and now data as code is set to add even more value to cloud data lakes by overcoming a few of their pressing shortcomings, according to Lyons.

“The data lakes this time around with the Apache Iceberg table format and what Dremio’s working on around metadata, these things aren’t going to become data swamps anymore,” he explained. “They’re actually going to be functional systems that do inserts and updates into leads. You can see all the commits. You can time travel them. And all the files are actually managed and optimized so you have to partition the data.”

With a strong handle on manifest files, the changes happening within files, the query engines, and other commits, developers are better suited to create a functional system that isn’t just a “data swamp,” according to Lyons.

There’s a general clamor in the industry for business intelligence tools that are capable enough to handle the tsunami of data sources businesses have to deal with, Lyons added.

“From a data sources side, Dremio is very proficient with our Parquet files in an object store, like we just talked about, but it also can access data in other relational systems,” he said.

Watch the complete video interview below, part of SiliconANGLE’s and theCUBE’s coverage of the AWS Startup Showcase: “Data as Code — The Future of Enterprise Data and Analytics” event:

(* Disclosure: TheCUBE is a paid media partner for the AWS Startup Showcase: “Data as Code — The Future of Enterprise Data and Analytics” event. Neither Dremio Corp., a sponsor for theCUBE’s event coverage, nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU