AI
AI
AI
The shift toward advanced artificial intelligence is accelerating as real-time AI capabilities reshape how interactions, decisions and workflows function in everyday systems.
Enterprises now expect systems that respond quickly, understand context and feel natural to use, raising the bar for accuracy and speed. That push fuels deeper conversations about latency, multimodality and the infrastructure needed to keep AI moving at a truly human pace, according to Scott Stephenson (pictured), chief executive officer of Deepgram Inc.
“Context is key,” Stephenson said. “There’s a lot of basic functionality and models now, like speech to text, text to speech, text to text, like a language model. In order to get those models to work at the highest level possible, you want to include context.”
Stephenson spoke with John Furrier at AWS re:Invent, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed emerging real-time AI capabilities, multimodal context and how low-latency intelligence is reshaping modern applications.
Real-time AI has emerged as one of the most demanding frontiers for model performance, and one of the first places this pressure becomes unavoidable is voice. Fast conversational exchanges require systems that can handle streaming input and output simultaneously, anchoring new expectations for responsiveness and quality. This shift reinforces the broader industry trend toward multimodal, context-rich architectures, Stephenson explained.
“One of the big announcements for us is bidirectional streaming in SageMaker,” he said. “This is big because right now most AI … are actually more like an interactive or batch mode workload. And there’s more for LLMs where you load all of the context in it once and then the output streams. For voice, though, you can’t wait to load all the context in.”
The momentum behind low-latency interaction is pushing developers to rethink how they architect applications. Instead of waiting for full context to load, models must adapt to streaming environments that mimic natural conversation. This forces innovation not only in model behavior, but also in the underlying infrastructure supporting it, especially in environments where milliseconds define user experience.
“That means that you need to stream in and you need to stream out,” Stephenson added. “This is a big announcement for us that I think will be a primitive that’s used for a decade plus.”
As more AI workloads shift toward real-time expectations, industries such as healthcare, customer service and enterprise collaboration are reworking their assumptions about speed, quality and reliability. Developers face rising pressure to deliver conversational fluidity that mirrors human responsiveness, making streaming intelligence a new baseline for user trust, Stephenson noted.
“I think real-time AI is just an overall much larger category and most AI now is not real time,” he said. “We’ll probably move to … [it] five years, 10 years from now.”
The growing role of real-time interaction also highlights why latency has become a defining factor for scaling AI. Task-based systems can afford a pause, but conversational workloads cannot — and the gap between those modes is reshaping expectations for model performance and architecture. That shift is driving deeper collaboration across AI companies and cloud providers to meet rising enterprise demand, Stephenson pointed out.
“The importance of that is that you reduce the latency,” he said. “Like in how we’re speaking right now, I can respond to you, you can respond to me. Everything is happening in real time. It’s streaming into my brain and it’s streaming out of my mouth. If I had to wait … it would be awkward and it’s slow.”
Here’s the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of AWS re:Invent:
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.