Google debuts first Gemini generative AI model, with advanced reasoning and native multiple modes
Google LLC today is taking the fight to OpenAI and ChatGPT with the long-delayed launch of what it says is its most capable, powerful and sophisticated generative artificial intelligence model yet.
Called Gemini, it will sit at the heart of a number of Google’s services and products, including its generative AI chatbot Bard and the new Pixel 8 Pro smartphone. In a media briefing yesterday, Google explained that Gemini is a natively multimodal large language model, meaning it can understand and generate various kinds of information, including text, code, audio, images and videos. Google says it’s the most flexible model the company has ever made, capable of running everywhere from cloud and on-premises data centers to low-powered mobile devices.
The first iteration of Gemini, version 1.0, will be made available in phases — which might explain recent stories saying the unveiling was being postponed until next month. Google has been scrambling to catch up with the momentum of ChatGPT and associated OpenAI models since the chatbot was introduced one year ago.
Indeed, though Gemini Ultra, the largest and most capable version designed for highly complex tasks, will be available for developers and enterprises in an early access program, Google said it won’t roll out more broadly until “early 2024.” Gemini Pro, a lighter version designed to scale across a broader range of tasks, will be available to developers and enterprise customers in Google’s Vertex AI and AI Studio Dec. 13. And Android developers can sign up today for access to Gemini Nano, a more lightweight and efficient version for on-device tasks.
The first place most people can use Gemini is Google’s Bard chatbot, which is powered by a version of Gemini Pro. Sissie Hsiao, general manager of Google Assistant and Bard, said during a press briefing that this fine-tuned version of Gemini Pro enables Bard to provide better reasoning, planning and understanding than before.
“With this, Google is trying to become a one stop shop for large enterprises to train their massive LLMs and run on Google Cloud,” Constellation Research analyst Andy Thurai told SiliconANGLE. “There is a lot of money involved in that.”
Advanced reasoning and native multimodality
Some of the biggest claims from Google center on Gemini’s performance. Executives said that tests show it exceeds current state-of-the-art models on 30 of the 32 most widely-used academic benchmarks for LLMs.
For instance, Gemini Ultra, the most powerful version, is the first LLM in the world to outperform humans in massive multitask language understanding tests based on a combination of 57 subjects, such as math, physics, history, law, ethics and medicine, designed to test general knowledge and problem-solving abilities. Google said the secret is Gemini’s advanced reasoning capabilities, which allow it to “think more carefully” before it answers complex questions.
Elsewhere, Google said Gemini Ultra attained a 59.4% score on the latest MMMU benchmark, which encompasses various multimodal tasks that span multiple domains and require “deliberate reasoning,” the company said.
Gemini also outperformed other models on tasks such as image recognition, and Google said the results highlight the advantages of its native multimodality. Although other LLMs claim to be multimodal, Google said this is only achieved by training separate components on different modalities before stitching them all together in one model. Gemini, on the other hand, is “natively multimodal,” according to Google, having been pretrained from the get-go on various modalities and fine-tuned with additional data to refine its effectiveness.
The company claims that makes Gemini better at understanding and reasoning for every kind of input. For instance, it can make sense of complex written and visual information, officials said, uncovering insights that other models would struggle to discern from mountains of data. However, to be clear, for now Gemini provides results of queries only in text or code, not images or voice.
Discussing its capabilities, officials said the first iteration of Gemini is especially good at coding, possessing the ability to understand, explain and also generate code in programming languages such as Python, Java, C++ and Go. They said Gemini Ultra excelled in a number of coding benchmarks, including HumanEval and Natural2Code.
Indeed, it’s so good at coding that Google believes it can function as the engine of advanced coding systems such as AlphaCode 2, which is a new system that’s designed to solve competitive programming problems that go beyond coding, involving complex math and theoretical computer science, Google said. With a specialized version of Gemini, Google said, it was able to enhance the capabilities of AlphaCode 2 and demonstrate a significant performance leap, solving twice as many problems in the same timeframe as the original version.
Other claims about Gemini involve its efficiency and ability to scale up. For instance, Google said Gemini runs much faster than earlier, smaller and less-capable LLMs when powered by its in-house Tensor Processing Units, while providing significant improvements in energy efficiency.
Google also today announced its latest TPU, v5p, which Amin Vahdat, Google Cloud’s vice president and general manager of machine learning, systems and cloud AI, said is is “most powerful, scalable and flexible AI accelerator thus far.” It’s made up of 8,960 chips alongn with what Google says is its fastest interconnect to date. Gemini was trained on the chips.
In addition, Google Cloud announced an AI Hypercomputer, which is calls a “groundbreaking supercomputer architecture that employs an integrated system of performance-optimized hardware, open software, leading ML frameworks and flexible consumption models.”
Gemini is also said to have the most comprehensive safety evaluations of any Google-made LLM to date, with robust policies and guardrails in place to prevent bias, toxicity and “hallucinations,” which is when models simply make up false answers.
Google said it has spent many hours researching the potential risks of advanced AI systems, identifying concerns such as cyber-offense, persuasion and autonomy. Armed with this research, it set about building protections in advance of Gemini’s deployment. These include “dedicated safety classifiers” that are meant to identify, label and filter out content that includes violence or negative stereotypes.
Google said Gemini 1.0 Pro is available today through its chatbot service Bard, which is using a version that’s fine-tuned for more advanced reasoning, planning and understanding. It’s said to be the “biggest upgrade to Bard,” which was previously powered by an alternative LLM known as PaLM 2, since Bard was launched earlier this year.
It’s initially being made available only in English in more than 170 countries and territories. Google does plan for Gemini to support multiple languages besides English, in addition to new modalities, but only said these will be available in the “near future.”
Besides Bard, the lightweight Gemini Nano version will be pre-installed on the new Pixel 8 Pro smartphone, where it will power features such as Summarize in the Recorder application and Smart Reply in Gboard, beginning with WhatsApp.
In the coming months, Google plans to integrate Gemini with dozens of additional products and services, including Search, Chrome, Ads and Duet AI. An experimental version of Gemini-powered search is already available via Google’s Search Generative Experience, and the company claims it has achieved a 40% reduction in latency, while delivering higher-quality responses.
For developers, the most anticipated development will be the launch of the Gemini application programming interface in Google AI Studio and Google Cloud Vertex AI. It’s set to roll out next week, and will enable developers to integrate Gemini into various third-party applications. Google AI Studio is a free web-based developer tool for prototyping and launching applications quickly using an API key, while Vertex AI is a more comprehensive AI app development platform that will allow for Gemini to be fully customized and fine-tuned with proprietary data.
In addition, Android developers will also be able to access Gemini Nano via a new system capability with Android 14 called AICore.
As for the most powerful version, Gemini Ultra, this is said to be undergoing additional trust and safety checks before it will be released. Further refinements will also be made before it’s rolled out to select customers, developers and partners for experimentation early next year.
Also next year, Google plans to launch Bard Advanced, which it says is a “new, cutting-edge AI experience” that provides access to its most advanced LLMs, beginning with Gemini Ultra.
A significant milestone for AI?
For all of the claims Google is making, there are reasons to be skeptical about how well Gemini will perform in the real world. After all, Google is clearly playing catch-up with OpenAI and by extension it’s biggest funder and partner, Microsoft Corp., and Gemini itself appears to be somewhat late, with many of its parts not expected to become available until next year.
“Gemini was delayed due to concerns about the readiness, safety concerns and especially the problems with non-English queries,” noted Constellation’s Thurai. “Given that this will be embedded into all their product lines, across the world, they didn’t want to have another fiasco of Bard. This is another reason why the live version got replaced with ‘virtual demos’ while they continue to improve. But you can’t wait and fall far behind mindshare with massive announcements from OpenAI/Microsoft and AWS on a constant basis.”
In addition, while Google executives boasted of Gemini’s ability to outperform OpenAI’s GPT-3.5, they were less forthcoming when asked about how it compares with GPT-4. And although Gemini will be integrated into an impressive number of Google’s products and services, it’s worth remembering that Microsoft has already integrated GPT-4 into many of its own offerings, such as Search, Windows, Office and various developer tools.
Still, Google insisted that the launch of Gemini is a “significant milestone” in the development of AI, representing the beginning of a “new era” for application development. Officials said they’re working around the clock, which doesn’t seem like much of an exaggeration, to improve Gemini and expand its capabilities. It added that future iterations will bring improvements in planning and memory, while enhancing its ability to process more information.
With reporting from Robert Hof
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.