UPDATED 14:09 EDT / DECEMBER 08 2023

AI

Google draws criticism for demo video of its new Gemini large language model

by Maria Deutscher

Google LLC has drawn criticism for a demo video of Gemini, the advanced large language model it debuted on Wednesday.

The search giant is positioning Gemini as an alternative to OpenAI’s ChatGPT. The new model has three editions, called Nano, Pro and Ultra, that vary in sophistication. Google began rolling out the Pro version of Gemini to its Bard chatbot this week and plans to make it available to developers through an application programming interface in the coming days.

On Wednesday, the day Gemini was announced, the company released a video demonstrating its capabilities. The video appears to show Gemini watching footage of a Google staffer performing various simple activities and generating natural-language descriptions of those activities. The staffer also asks Gemini to perform a series of tasks, such as inventing games, that it appears to complete with a high degree of creativity.

The video’s description contains a link to a blog post that explains Gemini did perform the depicted tasks, but with a significant amount of assistance from a human. Additionally, the description contains a brief disclaimer that states “latency has been reduced and Gemini outputs have been shortened for brevity.” However, a user who only watches the video without reading its description can’t necessarily glean that information from the demo footage alone.

One part of the video appears to show Gemini correctly determining that a Google staffer is playing “Rock paper scissors.” In the accompanying blog post, the search giant details that the AI made the deduction after receiving a prompt and series of images as input. The prompt contained the clue “Hint: it’s a game.”

In another demonstration, the Google staffer asked Gemini to come up with a game idea based on footage of a rubber duck and a map. Gemini did invent a game, the company’s blog post revealed, but only after it received detailed instructions on how to do so. The AI model was also given a gameplay example.

“All the user prompts and outputs in the video are real, shortened for brevity,” Oriol Vinyals, the principal research scientist at Google DeepMind, wrote in a post on X. “The video illustrates what the multimodal user experiences built with Gemini could look like. We made it to inspire developers.”

The criticism of the demo video follows mixed user reactions to Gemini Pro, the second most advanced version of the model. Gemini Pro started rolling out to Google’s Bard chatbot earlier this week.

One social media user asked the new version of Bard for a six-letter French word and received a five-letter word in response. Another user reported that Gemini Pro failed to answer a set of trivia questions about this year’s Oscars correctly. Additionally, it appears the model’s code generation feature struggles to perform some simple programming tasks.

Google plans to release a more advanced version of the model, Gemini Ultra, early next year. It will be capable of processing not only text but also images, videos and audio. Google expects that Gemini Ultra will be better than the model’s other versions at complex tasks such as solving mathematical problems.

Gemini Ultra also outperforms GPT-4 across several benchmark tests used by researchers to evaluate language models’ capabilities. However, it bests OpenAI’s flagship model by only a few percentage points. That means OpenAI’s next flagship model, which is already under development, would have to feature only slight improvements over GPT-4 to outperform Gemini Ultra.

Image: Google

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.