UPDATED 10:00 EDT / FEBRUARY 15 2024


Google announces Gemini 1.5 for developers to experiment on

Google LLC today announced the next generation of its artificial intelligence foundational model Gemini with version 1.5, which the company said changes almost every part of its development and infrastructure to make it more efficient to train and serve.

The first version of Gemini 1.5 released for early testing will be Gemini 1.5 Pro, a midsized multimodal model optimized for a wide range of tasks. Starting today it will be available for a limited number of developers and enterprise customers via AI studio and Vertex AI only in private preview.

“The 1.5 Pro model is as capable as the 1.0 Ultra model,” Oriol Vinyals, vice president of research at Google DeepMind, said at a press briefing. He added that it supplies these capabilities while being more efficient in compute due to enhancements in architecture.

Gemini 1.0 Ultra, the largest and most capable enterprise-grade AI model released by Google to date, was introduced in December. Users gained access to Ultra for the first time earlier this month through Gemini Advanced, which is part of the Google One AI Premium Plan.

Although Gemini 1.5 Pro will have a standard context window of 128,000 tokens, users will be able to test it with up to 1 million tokens, Vinyals said. For comparison, Gemini Pro has a context window of 13,000 tokens and Anthropic PBC’s Claude 2.1 has 200,000.

At 1 million tokens the AI can ingest up a video up to an hour in length, 11 hours of audio, more than 30,000 lines of code or more than 700,000 words. The model has also been tested with up to 10 million tokens by researchers.

New architecture for greater efficiency and reduced compute

“We all know the larger the model, the more capable it is but that comes at a cost,” said Vinyals. “Training and inference become fairly expensive. So ‘mixture of experts’ has a large amount of parameters, but only a few of them activate based on the kind of queries that we send to the model. In one way it operates much like our brain does.”

While traditional AI transformer models work as one giant neural network, “mixture of experts” models are divided into smaller modules, or multiple “expert” neural networks. When the model receives an input, it will selectively activate a path that triggers a pathway that fits its needs, using only the compute required to complete that particular task.

This opens up the model to do amazing things, which Google showed off during a press demonstration. For instance, the model can ingest a 44-minute silent Buster Keaton movie and analyze plot points and events.

That allows a user to ask questions such as, “During one point of the movie a piece of paper was removed from a pocket. What was written on it?” After about a minute of processing the model was able to cite the timestamp of the scene, note that it was a pawn ticket and cite the date on the ticket.

In another demonstration, a user drew a simple line drawing of a water silo drenching an unfortunate actor and asked the model to give the timestamp. It took the model a short duration but it eventually came back with the timestamp for the scene.

Vinyals said that with the extremely long context window, Gemini 1.5 Pro showed extreme promise for what he called “in-context learning.” That means it could be taught capabilities that the model wasn’t aware of using a prompt without the need for fine-tuning.

To demonstrate this, the model was given a grammar manual and a dictionary for Kalamang, a severely endangered language with fewer than 200 fluent speakers worldwide. Afterward, the model was capable of translating from English to Kalamang and vice versa at a learner’s level.

Although the limited release will start with the 1.5 Pro model, there is a 1.5 Ultra model in the works, Vinyals said, since with model research and development, scaling is always incremental.

Early testers can try the 1 million token context window at no cost during the test period. Coming soon, the company said it intends to introduce pricing tiers that will open the standard 128,000 context window and scale up to 1 million. Developers looking to get access to 1.5 Pro can sign up now in AI Studio.

Image: Google

