How Amazon is reinventing storage and data access for generative AI
Transitioning from data lakes to generative artificial intelligence tools is a significant driver for exploring fresh opportunities and embarking on a journey of possibilities.
Organizations want rapid interactions with data, so Amazon S3 is being reinvented, according to Andy Warfield (pictured), vice president and distinguished engineer at Amazon.com Inc. The goal is to provide speed, cost-effectiveness and simplicity, while also thinking about the decoupling of storage from compute and the evolving storage equation, he explained.
“The thing that I think is really interesting in here is the customer experience of curating that data,” Warfield said. “They don’t want to think about storage. They absolutely want to have good, sound practices around the structure of their data and the governance. So, as customers are looking at generative AI, they don’t want to be taking their data out of their data lake and shipping it to some external model. They really want to be bringing the model to the data.”
Warfield spoke with theCUBE industry analyst John Furrier at the “Supercloud 5: The Battle for AI Supremacy” event, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed Amazon’s focus on reinventing storage and data access, integrating with open-source tools and new technologies to provide seamless data curation.
Vector databases
Vector databases are a hot trend, but the challenge lies in interoperability and architectural lock-in, with the approach being focused on choice and the ability to put vectors where you want them to be, according to Warfield. There are examples of organizations that heavily embraced the transformation of its data practices. They made a significant investment and are now experiencing the benefits.
“The story that I loved the most was Pfizer. These examples of customers that leaned in heavily on changing their data practice,” Warfield said. “To invest in a data lake and then grow it … [they] are now realizing this agility to go and experiment with stuff like generative AI, to experiment with new things. I think that really speaks for itself in terms of what people are doing.”
Meanwhile, engineers are increasingly reengineering their environments using open-source tools, such as Apache Airflow, focusing on data engineering rather than data science or database administration, according to Warfield. With this in mind, Amazon S3 is being set up to work in conjunction with the system — with a focus on understanding the consequences of data structure on workload and investing in open source and client side for data engineering and application.
Organizations are struggling with GPU scarcity, focusing on cost and performance, while there is a relationship between chip and model developers for learning and innovation, according to Warfield.
“I think one thing that we are certainly seeing is … customers really want to keep those GPUs busy all the time,” he said. “That’s an example of the full systems view, whether it’s getting data onto the box, or getting data into GPU memory. There are innovation opportunities across that whole space.”
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of the “Supercloud 5: The Battle for AI Supremacy” event:
Photo: SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU