UPDATED 09:00 EDT / JUNE 27 2024

AI

Google sets sights on the enterprise with wider releases of Gemini 1.5 Flash, 1.5 Pro and Imagen 3

Google LLC is continuing its steady development of enterprise-ready artificial intelligence models with an announcement today that it’s bringing out its low-latency model Gemini 1.5 Flash in public preview and Gemini 1.5 Pro’s 2 million-token input window in general availability.

The company also announced that its next-generation high-quality text-to-image generating model Imagen 3 is now out in preview, featuring numerous quality improvements over Imagen 2.

Gemini 1.5 Flash

Gemini 1.5 Flash arrived last month in public preview and is now generally available. As a large language model, it combines competitive pricing and a 1 million-token context window with high-speed processing. This means that its input size is 60 times bigger than that of GPT-3.5 Turbo from OpenAI and on average 40% faster.

Most importantly, it is designed to provide a very low input token price, making it competitively advantageous alongside low-latency processing. Google said that customers such as Uber Technologies Inc. have been using Gemini 1.5 Flash for its UberEats food delivery service. The company built the Eats AI assistant and saw close to 50% faster response times and better customer experience.

Gemini 1.5 Pro with 2M token input window

Gemini 1.5 Pro is now available with a colossal 2 million-token input window capacity, unlocking new features allowing enterprise customers to ingest thousands of documents and extremely long videos. At this context size, 1.5 Pro can bring in 2 hours of video, 22 hours of audio, and more than 60,000 lines of code or 1.5 million words, and process it in record time.

“We’ve had numerous companies find enormous value in this,” said Google Cloud Chief Executive Thomas Kurian. “For example, we’ve had retailers use the large context window and use cameras in the store to understand where people are during peak times to adjust their work surfaces to make the flow of people in the store more effective. We have financial institutions take all the 10-Ks and 10-Qs being generated at the end of every earnings day and ingest all of them as one corpus so that you can reason across all the announcements.”

With the larger context window, businesses have an increased degree of freedom to take in larger libraries of documents at once. Before, extremely large documents or videos needed to be chopped up into smaller chunks to be fed through a model so that they could be processed, summarized, refined and then processed again. That’s not only tedious, but it takes up time.

“Larger context windows are great as long as we don’t suffer from high latency and costs. However, Google has demonstrated that is not the case,” Sanjeev Mohan, industry analyst and principal at SanjMo, told SiliconANGLE. “They can load two hours of video into the 2M context token window in a minute and start asking questions in natural language. The same can be done for loading, let’s say, all of an organization’s financial documents.”

Imagen 3 upgraded with better quality

Launching in preview on Vertex AI, Google Cloud’s managed AI delivery platform, Imagen 3 is Google’s latest image generation foundation model and delivers lifelike rendering from natural language prompts, with multiple improvements over Imagen 2. This includes over 40% faster image generation, better prompt understanding, instruction following and increased capability for realistic generations of groups of people.

Imagen 3 has also been updated so users have better control over the generation and placement of text in produced images. Text production by diffusion-style text-to-image model is often a challenge, as these types of models can sometimes produce gibberish or completely misunderstand prompts that request text generation.

“The early results of Imagen 3 models have pleasantly surprised us with its quality and speed in our testing,” said Gaurav Sharma, head of AI research at Typeface, a startup that specializes in leveraging generative AI for enterprise content creation. “It brings improvements in generating details, as well as lifestyle images of humans.”

The new model also provides multi-language support and new support for multiple aspect ratios.

“Google now has two ways to generate images,” noted Mohan. “One can use either multi-modal Gemini or Diffusion-based Imagen 3 with more advanced graphic capabilities.”

Advanced grounding capabilities with enterprise truth

At its developer conference Google I/O in May, Google announced the general availability of grounding with Google Search in Vertex AI. This capability allows Gemini outputs to be augmented with fresh, high-quality real-time information from Google Search. Starting next quarter, Vertex AI will offer a new service that will provide trusted third-party data for generative AI agents for grounding in enterprise truth.

The company said it’s working with trusted sources of information, including providers such as financial data provider Moody’s Corp., legal multinational information company Thomson Reuters Corp. and commercial search engine ZoomInfo Technologies Inc. These companies will provide access to trusted, up-to-date curated information sources that can be tapped into as trusted, grounded information.

For institutions that require even tighter controls and factual responses, Google is offering high-fidelity mode grounding on internal data for highly sensitive use cases such as financial services, healthcare and insurance. Announced in experimental preview, this type of grounding is powered by a version of Gemini 1.5 Flash that has been fine-tuned to use only customer-provided content and will generate answers based only on that data, ignoring the model’s world knowledge.

For example, a model that has been set to only work from a specific set of healthcare data between 2022 and 2024 about blood sample documentation would answer it with a high accuracy given the documents. However, if asked a question about documents from 2021, or anything else off-topic, it would reply that the information provided does not have anything from 2021 instead of making something up.

That ensures high levels of factuality in responses and greatly reduces the chances of “hallucinations,” or when a model confidently replies in error, Google says. At the same time, the model provides a percentage score of how confident it is that its reply is good along with a source that the user can follow back to the origin of its response.

Images: SiliconANGLE/Gemini Image Generator, Google

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU