Unified AI data pipelines: Vast Data tackles scale and traffic challenges
The age of artificial intelligence is spurring a revolution in data management, and Vast Data Inc. has positioned itself at the center with its Vast Data Platform, which stores unstructured data for use in training AI models.
The company recently partnered with Nvidia Corp. to create huge cloud architectures that support AI data engines. Supporting large amounts of data is more important than ever before, according to Jeff Denworth (pictured), co-founder of Vast Data.
“Vast is kind of the embodiment of a transformation in the market where people are now working on very intense AI workloads,” he said. “There are two new waves that are emergent. One is the shift from text-based to multimodal models has created … this kind of boost in the amount of data that customers have to deal with. And then the second is we’re working with some of the organizations that are building super intelligent or artificial general intelligence systems. And these are organizations that are now processing exabytes of infrastructure.”
Denworth spoke with theCUBE Research’s Dave Vellante, John Furrier and Rob Strechay at the Supercloud 7: Get Ready for the Next Data Platform event, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed modern data management for AI and how Vast Data is changing data architecture.
Creating a unified data pipeline for AI
Vast Data’s goal has been to simplify data storage and computational infrastructure, This gives users access to a variety of unstructured data types for their AI models, according to Denworth.
“We stitched together the world’s most scalable file system with the world’s first exabyte scale transactional data warehouse together in something that we call the Vast Data Platform as a solution to the totality of AI pipelines. That includes data preparation, data training and inference, data logging and data collection,” he said.
The past few years have given rise to AI clouds such as CoreWeave and AWS Lambda, but that is not where data originates. Vast Data is focused on creating a unified data pipeline that processes data from multiple clouds and data centers.
“We’ve built this ability to federate multiple clouds with something that we call the Vast Data space,” Denworth said. “And essentially it allows you to kind of flow all your files and all of your records that sit in tables across a series of independent cloud platforms so that you can have one unified pipeline.”
Solving problems of scale with the Vast Data Platform
Data architecture has always presented unique challenges. The Google File System in 2003 solved challenges of scale by building very large clusters composed of commodity nodes, but scaling up those clusters has proved difficult. In recent years, Vast Data has worked to solve the problems of scale and East-west traffic, according to Denworth.
“We basically split the cluster apart and now you have a federation or a sea of stateless cores that run the software of the system in just containers,” he said. “Then all of the cores over this network can see one single volume of SSDs because we’ve built this essentially transactional data structure that allows for all of the cores to write and read into this volume without having to coordinate with each other. What comes of this is an ability to basically eliminate east-west traffic in hyperscale clusters.”
The Vast Data Platform allows users to stream in data natively at any level of scale. The platform has closed the gap between data availability and observability in data lake and data lake house infrastructure, Denworth believes.
“By solving for the IO bottlenecks of conventional data science pipelines, what we’re showing the world is you can get anywhere between a two to a 20x improvement pipeline performance,” he said. “My general perspective is that most of the data science industry is not interested in solving this problem. ’cause everybody sells by the core. There’s a conflict of interest with respect to optimization that we just threw right out the window.”
Transitioning to Vast Data is usually nondisruptive, according to Denworth, with customers rarely having to rethink their pipelines. Instead, the company is making users’ infrastructures simpler and more cost-effective. As Vast looks toward the future, there is also the potential to involve agentic AI.
“What we see is just a massive opportunity to go and index the world’s data,” he said. “If you think about a system with deep roots and unstructured data as Vast has, taking that data and building it into a corpus that can be basically interfaced with by large language models is absolutely something that we’re super excited about.”
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE Research’s coverage of the Supercloud 7: Get Ready for the Next Data Platform event:
Photo: SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU