UPDATED 11:55 EDT / SEPTEMBER 25 2023

AI

OpenAI’s ChatGPT chatbot now allows users to use voice and pictures to get answers

People have been able to hold text conversations with OpenAI LP’s chatbot powered by artificial intelligence for a long time now, but the company said today it’s upgrading it so that they can chat with it out loud with their own voices.

Users will also be able to snap photographs and hold back-and-forth conversations with the chatbot about what’s in the image in order to learn more about what they’re looking at.

The voice chat feature is designed to pick up what a person is saying to the AI chatbot and translate it so that the system can understand, using a model similar to OpenAI’s open-source Whisper model, which can transcribe human speech into text. It uses a new text-to-speech AI model that can generate humanlike audio from just a few seconds of sample speech.

OpenAI said that the company’s developers collaborated with professional voice actors to create a number of different voices for users to choose from for the new experience. OpenAI’s offers five different voices with natural-sounding names such as “Juniper,” “Ember,” “Sky,” “Cove” and “Breeze.” The voices come in both genders, exhibit high clarity and excellent intonation, so they work for storytelling, reading the news or just generally chatting with.

OpenAI added that it’s also working with Spotify on the pilot of its new Voice Translation feature, which will use the new voice model to allow podcasters to translate their podcasts into other languages using their own voices.

The new voice feature will be coming to ChatGPT Plus and Enterprise users over the next two weeks in iOS and Android as an opt-in. To get started, users can find it in the New Features area of the settings on the mobile app and it is activated by tapping the headphone button.

Chatting about images

With images, users will be able to get even more out of ChatGPT by being able to photograph a scene, an item or anything else and then ask the AI about what they’re looking at. They will then be able to converse with the chatbot about what it can see in order to solve a complex math problem, assemble a crib, learn about a landmark or get directions to a faraway place.

For example, a user could take a picture of what’s available in the fridge and ask about what they could make for dinner, if there are enough potential ingredients visible. They could walk down an aisle at the store and get information on products from ChatGPT by taking pictures of items and comparison shop. It would also be possible to take a picture of a grill that had been in the garage for an entire winter that a user couldn’t get lit in an attempt to get help and ChatGPT could look up the manual and help the user get it working again.

This new capability is an upgrade over the current capabilities already seen on the market, such as Google Lens, which provides a powerful image search that can identify what’s in a photograph. Google LLC’s AI lab Google DeepMind has also developed an AI model for the vision-impaired with Lookout on Android. It uses an AI model that describes photographs and it allows users to ask follow up questions about the image.

OpenAI mentioned that its own work with an app for the vision-impaired called Be My Eyes, a free mobile app powered by GPT-4, informed the company’s approach to building the new image capabilities built into ChatGPT.

With the ability to connect real-life images to internet searches and to talk to the chatbot, users will have all-new capabilities and it’s clear that OpenAI is attempting to push the envelope of what it’s capable of.

The company also stressed that there are privacy implications involved when people might be in the frame of view. For example, what happens if someone snaps of photograph of someone that the AI has public information on but probably shouldn’t be revealing? OpenAI said the company took steps to greatly limit the model when it came to analysis of people and it wouldn’t make direct statements about them to respect their privacy — especially because ChatGPT is not always accurate.

Image: OpenAI

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU