UPDATED 09:00 EDT / JUNE 27 2024

Google sets sights on the enterprise with wider releases of Gemini 1.5 Flash, 1.5 Pro and Imagen 3

Google LLC is continuing its steady development of enterprise-ready artificial intelligence models with an announcement today that it’s bringing out its low-latency model Gemini 1.5 Flash in public preview and Gemini 1.5 Pro’s 2 million-token input window in general availability.

The company also announced that its next-generation high-quality text-to-image generating model Imagen 3 is now out in preview, featuring numerous quality improvements over Imagen 2.

Gemini 1.5 Flash

Gemini 1.5 Flash arrived last month in public preview and is now generally available. As a large language model, it combines competitive pricing and a 1 million-token context window with high-speed processing. This means that its input size is 60 times bigger than that of GPT-3.5 Turbo from OpenAI and on average 40% faster.

Most importantly, it is designed to provide a very low input token price, making it competitively advantageous alongside low-latency processing. Google said that customers such as Uber Technologies Inc. have been using Gemini 1.5 Flash for its UberEats food delivery service. The company built the Eats AI assistant and saw close to 50% faster response times and better customer experience.

Gemini 1.5 Pro with 2M token input window

Gemini 1.5 Pro is now available with a colossal 2 million-token input window capacity, unlocking new features allowing enterprise customers to ingest thousands of documents and extremely long videos. At this context size, 1.5 Pro can bring in 2 hours of video, 22 hours of audio, and more than 60,000 lines of code or 1.5 million words, and process it in record time.

“We’ve had numerous companies find enormous value in this,” said Google Cloud Chief Executive Thomas Kurian. “For example, we’ve had retailers use the large context window and use cameras in the store to understand where people are during peak times to adjust their work surfaces to make the flow of people in the store more effective. We have financial institutions take all the 10-Ks and 10-Qs being generated at the end of every earnings day and ingest all of them as one corpus so that you can reason across all the announcements.”

With the larger context window, businesses have an increased degree of freedom to take in larger libraries of documents at once. Before, extremely large documents or videos needed to be chopped up into smaller chunks to be fed through a model so that they could be processed, summarized, refined and then processed again. That’s not only tedious, but it takes up time.

“Larger context windows are great as long as we don’t suffer from high latency and costs. However, Google has demonstrated that is not the case,” Sanjeev Mohan, industry analyst and principal at SanjMo, told SiliconANGLE. “They can load two hours of video into the 2M context token window in a minute and start asking questions in natural language. The same can be done for loading, let’s say, all of an organization’s financial documents.”

Imagen 3 upgraded with better quality

Launching in preview on Vertex AI, Google Cloud’s managed AI delivery platform, Imagen 3 is Google’s latest image generation foundation model and delivers lifelike rendering from natural language prompts, with multiple improvements over Imagen 2. This includes over 40% faster image generation, better prompt understanding, instruction following and increased capability for realistic generations of groups of people.

Imagen 3 has also been updated so users have better control over the generation and placement of text in produced images. Text production by diffusion-style text-to-image model is often a challenge, as these types of models can sometimes produce gibberish or completely misunderstand prompts that request text generation.

“The early results of Imagen 3 models have pleasantly surprised us with its quality and speed in our testing,” said Gaurav Sharma, head of AI research at Typeface, a startup that specializes in leveraging generative AI for enterprise content creation. “It brings improvements in generating details, as well as lifestyle images of humans.”

The new model also provides multi-language support and new support for multiple aspect ratios.

“Google now has two ways to generate images,” noted Mohan. “One can use either multi-modal Gemini or Diffusion-based Imagen 3 with more advanced graphic capabilities.”

Advanced grounding capabilities with enterprise truth

At its developer conference Google I/O in May, Google announced the general availability of grounding with Google Search in Vertex AI. This capability allows Gemini outputs to be augmented with fresh, high-quality real-time information from Google Search. Starting next quarter, Vertex AI will offer a new service that will provide trusted third-party data for generative AI agents for grounding in enterprise truth.

The company said it’s working with trusted sources of information, including providers such as financial data provider Moody’s Corp., legal multinational information company Thomson Reuters Corp. and commercial search engine ZoomInfo Technologies Inc. These companies will provide access to trusted, up-to-date curated information sources that can be tapped into as trusted, grounded information.

For institutions that require even tighter controls and factual responses, Google is offering high-fidelity mode grounding on internal data for highly sensitive use cases such as financial services, healthcare and insurance. Announced in experimental preview, this type of grounding is powered by a version of Gemini 1.5 Flash that has been fine-tuned to use only customer-provided content and will generate answers based only on that data, ignoring the model’s world knowledge.

For example, a model that has been set to only work from a specific set of healthcare data between 2022 and 2024 about blood sample documentation would answer it with a high accuracy given the documents. However, if asked a question about documents from 2021, or anything else off-topic, it would reply that the information provided does not have anything from 2021 instead of making something up.

That ensures high levels of factuality in responses and greatly reduces the chances of “hallucinations,” or when a model confidently replies in error, Google says. At the same time, the model provides a percentage score of how confident it is that its reply is good along with a source that the user can follow back to the origin of its response.

Images: SiliconANGLE/Gemini Image Generator, Google

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Google sets sights on the enterprise with wider releases of Gemini 1.5 Flash, 1.5 Pro and Imagen 3

Gemini 1.5 Flash

Gemini 1.5 Pro with 2M token input window

Imagen 3 upgraded with better quality

Advanced grounding capabilities with enterprise truth

Images: SiliconANGLE/Gemini Image Generator, Google

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

VMware Explore 2025

Future of Data Platforms Summit 2025

WOW: World of Workato 2025

Supermicro Open Storage Summit 2025

Black Hat USA 2025

Google sets sights on the enterprise with wider releases of Gemini 1.5 Flash, 1.5 Pro and Imagen 3

Gemini 1.5 Flash

Gemini 1.5 Pro with 2M token input window

Imagen 3 upgraded with better quality

Advanced grounding capabilities with enterprise truth

Images: SiliconANGLE/Gemini Image Generator, Google

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

VMware Explore 2025

Future of Data Platforms Summit 2025

WOW: World of Workato 2025

Supermicro Open Storage Summit 2025

Black Hat USA 2025

Cookies