AI
AI
AI
Google LLC and Cohere Inc. today released new artificial intelligence models optimized for audio processing tasks.
The search giant’s algorithm, Gemini 3.1 Flash Live, can automate customer service interactions. Cohere’s new AI is designed to transcribe speech. Both models provide significantly higher output quality than their predecessors.
Companies can use Gemini 3.1 Flash Live to build voice agents that field customer service calls. For example, a retailer could create an agent that automatically processes product return requests. Google says Gemini 3.1 Flash Live can detect when a user is frustrated or confused and adjust its responses accordingly.
The model understands not only speech but also other input such as images. That means a user with a malfunctioning smart home appliance could upload a photo of the device to help Gemini 3.1 Flash Live troubleshoot it. Furthermore, a tool use feature enables the model to retrieve data from external sources such as product documentation repositories.
Google evaluated the AI’s tool use capabilities with a benchmark called ComplexFuncBench Audio. Gemini 3.1 Flash Live scored 90.8%, a nearly 20% improvement over the company’s previous-generation model. It set a record on a second audio benchmark called Audio MultiChallenge.
Automating customer support interactions isn’t the only use case that Gemini 3.1 Flash Live supports. Developers can use it to build a voice interface for their applications. Additionally, the model underpins the voice features of Google’s Gemini chatbot and Search Live multimodal search tool.
“With the 3.1 Flash Live model under the hood, Gemini Live delivers faster responses compared to the previous model and it can follow the thread of your conversation for twice as long, keeping your train of thought intact during longer brainstorms,” Google product manager Valeria Wu and software engineer Yifan Ding wrote in a blog post.
Cohere Transcribe has a narrower focus: It’s built solely for transcription tasks. The company says the algorithm is the most accurate in its category with an average word error rate of 5.42%. That earned it the top spot on an audio model ranking called the Hugging Face Open ASR Leaderboard.
The new model begins the transcript generation process by translating raw audio into mathematical representations that are easier to process. That task is performed by a so-called Conformer algorithm. A Conformer combines a convolutional neural network, a type of AI that is often used for audio processing tasks, with a transformer model.
After turning audio into mathematical representations, Cohere Transcribe uses a standalone transformer to generate the transcript. Cohere says it can output text in more than a dozen languages. The model has a total of 2 billion parameters across its Conformer and transformer components, which means that it requires relatively little computing power to run.
Cohere Transcribe is available under an open-source Apache 2.0 license. Companies can run it on their own infrastructure or Cohere’s Model Vault managed inference service. The company also plans to integrate the algorithm with its North productivity platform, which enables workers to search business documents and automate repetitive tasks.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.