UPDATED 12:31 EDT / AUGUST 01 2024

Jeff Denworth, co-founder of Vast Data, talks to theCUBE about the Vast Data Platform at Supercloud 7 2024.

Unified AI data pipelines: Vast Data tackles scale and traffic challenges

The age of artificial intelligence is spurring a revolution in data management, and Vast Data Inc. has positioned itself at the center with its Vast Data Platform, which stores unstructured data for use in training AI models.

The company recently partnered with Nvidia Corp. to create huge cloud architectures that support AI data engines. Supporting large amounts of data is more important than ever before, according to Jeff Denworth (pictured), co-founder of Vast Data.

“Vast is kind of the embodiment of a transformation in the market where people are now working on very intense AI workloads,” he said. “There are two new waves that are emergent. One is the shift from text-based to multimodal models has created … this kind of boost in the amount of data that customers have to deal with. And then the second is we’re working with some of the organizations that are building super intelligent or artificial general intelligence systems. And these are organizations that are now processing exabytes of infrastructure.”

Denworth spoke with theCUBE Research’s Dave Vellante, John Furrier and Rob Strechay at the Supercloud 7: Get Ready for the Next Data Platform event, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed modern data management for AI and how Vast Data is changing data architecture.

Creating a unified data pipeline for AI

Vast Data’s goal has been to simplify data storage and computational infrastructure, This gives users access to a variety of unstructured data types for their AI models, according to Denworth.

“We stitched together the world’s most scalable file system with the world’s first exabyte scale transactional data warehouse together in something that we call the Vast Data Platform as a solution to the totality of AI pipelines. That includes data preparation, data training and inference, data logging and data collection,” he said.

The past few years have given rise to AI clouds such as CoreWeave and AWS Lambda, but that is not where data originates. Vast Data is focused on creating a unified data pipeline that processes data from multiple clouds and data centers.

“We’ve built this ability to federate multiple clouds with something that we call the Vast Data space,” Denworth said. “And essentially it allows you to kind of flow all your files and all of your records that sit in tables across a series of independent cloud platforms so that you can have one unified pipeline.”

Solving problems of scale with the Vast Data Platform

Data architecture has always presented unique challenges. The Google File System in 2003 solved challenges of scale by building very large clusters composed of commodity nodes, but scaling up those clusters has proved difficult. In recent years, Vast Data has worked to solve the problems of scale and East-west traffic, according to Denworth.

“We basically split the cluster apart and now you have a federation or a sea of stateless cores that run the software of the system in just containers,” he said. “Then all of the cores over this network can see one single volume of SSDs because we’ve built this essentially transactional data structure that allows for all of the cores to write and read into this volume without having to coordinate with each other. What comes of this is an ability to basically eliminate east-west traffic in hyperscale clusters.”

The Vast Data Platform allows users to stream in data natively at any level of scale. The platform has closed the gap between data availability and observability in data lake and data lake house infrastructure, Denworth believes.

“By solving for the IO bottlenecks of conventional data science pipelines, what we’re showing the world is you can get anywhere between a two to a 20x improvement pipeline performance,” he said. “My general perspective is that most of the data science industry is not interested in solving this problem. ’cause everybody sells by the core. There’s a conflict of interest with respect to optimization that we just threw right out the window.”

Transitioning to Vast Data is usually nondisruptive, according to Denworth, with customers rarely having to rethink their pipelines. Instead, the company is making users’ infrastructures simpler and more cost-effective. As Vast looks toward the future, there is also the potential to involve agentic AI.

“What we see is just a massive opportunity to go and index the world’s data,” he said. “If you think about a system with deep roots and unstructured data as Vast has, taking that data and building it into a corpus that can be basically interfaced with by large language models is absolutely something that we’re super excited about.”

Here’s the complete video interview, part of SiliconANGLE’s and theCUBE Research’s coverage of the Supercloud 7: Get Ready for the Next Data Platform event:

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+

CUBE Alumni Network

C-level and Technical

Domain Experts

15M+

theCUBE

Viewers

Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Unified AI data pipelines: Vast Data tackles scale and traffic challenges

Creating a unified data pipeline for AI

Solving problems of scale with the Vast Data Platform

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

theCUBE + NYSE Wired: MedTech Unplugged Series

Google Cloud Partner AI Series

Black Hat USA 2025

Open Storage Summit 2025

World of Workato 2025

RECENT CUBE EVENTS

AWS Mid-Year Leadership Summit 2025

RAISE Summit 2025

Blue Yonder AI and the Autonomous Supply Chain 2025

Data Protection & AI Summit 2025

Open Source Summit NA 2025

Unified AI data pipelines: Vast Data tackles scale and traffic challenges

Creating a unified data pipeline for AI

Solving problems of scale with the Vast Data Platform

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST STORIES

LATEST STORIES

theCUBE + NYSE Wired: MedTech Unplugged Series

Google Cloud Partner AI Series

Black Hat USA 2025

Open Storage Summit 2025

World of Workato 2025

AWS Mid-Year Leadership Summit 2025

RAISE Summit 2025

Blue Yonder AI and the Autonomous Supply Chain 2025

Data Protection & AI Summit 2025

Open Source Summit NA 2025

Cookies