UPDATED 17:01 EDT / MARCH 12 2019

AI

Google debuts miniaturized, real-time speech recognition AI on Pixel phones

Google LLC has developed a miniaturized neural network that is small and efficient enough to perform speech recognition, a normally hardware-intensive task, directly on mobile devices.

The technology debuted today on the company’s Pixel smartphones. Google has rolled it out to its Gboard virtual keyboard app as part of an update that will make the built-in voice dictation feature usable when a device doesn’t have internet access.

Previously, the feature required a steady connection to work since the app offloaded much of the computational heavy lifting to the cloud. This is still a requirement for other services that use artificial intelligence to process speech. The reason is that turning spoken word into text normally requires several different software components too complex to run on a handset. 

In a blog post, Google researcher Johan Schalkwyk said previous iterations of Gboard used no fewer than three separate AI models. The first was responsible for organizing raw audio into phonemes, the smallest units of spoken language, while the second stitched those phonemes together into words. The data was then fed to an AI that outputted complete phrases.

Google has managed to consolidate these three models into a single neural network that handles the entire process from start to finish. Moreover, the AI processes voice in real time as the user speaks.

“The model works at the character level, so that as you speak, it outputs words character-by-character, just as if someone was typing out what you say in real-time, and exactly as you’d expect from a keyboard dictation system,” Google’s Schalkwyk wrote.

In addition to streamlining the speech recognition workflow, the search giant has also shrunk Gboard’s decoder graph, a key component responsible for coordinating the entire process. Google reduced its size by a factor of 25, from 2 gigabytes in previous iterations of the app to just 80 megabytes.

The company believes that the technology over time could be taken beyond Gboard to other applications and use cases. Schalkwyk wrote that “given the trends in the industry, with the convergence of specialized hardware and algorithmic improvements, we are hopeful that the techniques presented here can soon be adopted in more languages and across broader domains of application.”

Photo: Tinh tế Photo/Flickr

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU