UPDATED 17:01 EST / MARCH 12 2019

AI

Google debuts miniaturized, real-time speech recognition AI on Pixel phones

Google LLC has developed a miniaturized neural network that is small and efficient enough to perform speech recognition, a normally hardware-intensive task, directly on mobile devices.

The technology debuted today on the company’s Pixel smartphones. Google has rolled it out to its Gboard virtual keyboard app as part of an update that will make the built-in voice dictation feature usable when a device doesn’t have internet access.

Previously, the feature required a steady connection to work since the app offloaded much of the computational heavy lifting to the cloud. This is still a requirement for other services that use artificial intelligence to process speech. The reason is that turning spoken word into text normally requires several different software components too complex to run on a handset. 

In a blog post, Google researcher Johan Schalkwyk said previous iterations of Gboard used no fewer than three separate AI models. The first was responsible for organizing raw audio into phonemes, the smallest units of spoken language, while the second stitched those phonemes together into words. The data was then fed to an AI that outputted complete phrases.

Google has managed to consolidate these three models into a single neural network that handles the entire process from start to finish. Moreover, the AI processes voice in real time as the user speaks.

“The model works at the character level, so that as you speak, it outputs words character-by-character, just as if someone was typing out what you say in real-time, and exactly as you’d expect from a keyboard dictation system,” Google’s Schalkwyk wrote.

In addition to streamlining the speech recognition workflow, the search giant has also shrunk Gboard’s decoder graph, a key component responsible for coordinating the entire process. Google reduced its size by a factor of 25, from 2 gigabytes in previous iterations of the app to just 80 megabytes.

The company believes that the technology over time could be taken beyond Gboard to other applications and use cases. Schalkwyk wrote that “given the trends in the industry, with the convergence of specialized hardware and algorithmic improvements, we are hopeful that the techniques presented here can soon be adopted in more languages and across broader domains of application.”

Photo: Tinh tế Photo/Flickr

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.