Google draws criticism for demo video of its new Gemini large language model
The search giant is positioning Gemini as an alternative to OpenAI’s ChatGPT. The new model has three editions, called Nano, Pro and Ultra, that vary in sophistication. Google began rolling out the Pro version of Gemini to its Bard chatbot this week and plans to make it available to developers through an application programming interface in the coming days.
On Wednesday, the day Gemini was announced, the company released a video demonstrating its capabilities. The video appears to show Gemini watching footage of a Google staffer performing various simple activities and generating natural-language descriptions of those activities. The staffer also asks Gemini to perform a series of tasks, such as inventing games, that it appears to complete with a high degree of creativity.
The video’s description contains a link to a blog post that explains Gemini did perform the depicted tasks, but with a significant amount of assistance from a human. Additionally, the description contains a brief disclaimer that states “latency has been reduced and Gemini outputs have been shortened for brevity.” However, a user who only watches the video without reading its description can’t necessarily glean that information from the demo footage alone.
One part of the video appears to show Gemini correctly determining that a Google staffer is playing “Rock paper scissors.” In the accompanying blog post, the search giant details that the AI made the deduction after receiving a prompt and series of images as input. The prompt contained the clue “Hint: it’s a game.”
In another demonstration, the Google staffer asked Gemini to come up with a game idea based on footage of a rubber duck and a map. Gemini did invent a game, the company’s blog post revealed, but only after it received detailed instructions on how to do so. The AI model was also given a gameplay example.
“All the user prompts and outputs in the video are real, shortened for brevity,” Oriol Vinyals, the principal research scientist at Google DeepMind, wrote in a post on X. “The video illustrates what the multimodal user experiences built with Gemini could look like. We made it to inspire developers.”
The criticism of the demo video follows mixed user reactions to Gemini Pro, the second most advanced version of the model. Gemini Pro started rolling out to Google’s Bard chatbot earlier this week.
One social media user asked the new version of Bard for a six-letter French word and received a five-letter word in response. Another user reported that Gemini Pro failed to answer a set of trivia questions about this year’s Oscars correctly. Additionally, it appears the model’s code generation feature struggles to perform some simple programming tasks.
Google plans to release a more advanced version of the model, Gemini Ultra, early next year. It will be capable of processing not only text but also images, videos and audio. Google expects that Gemini Ultra will be better than the model’s other versions at complex tasks such as solving mathematical problems.
Gemini Ultra also outperforms GPT-4 across several benchmark tests used by researchers to evaluate language models’ capabilities. However, it bests OpenAI’s flagship model by only a few percentage points. That means OpenAI’s next flagship model, which is already under development, would have to feature only slight improvements over GPT-4 to outperform Gemini Ultra.
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.