UPDATED 17:49 EDT / OCTOBER 18 2024

H2O.ai releases small language models for multimodal processing tasks

H2O.ai Inc. on Thursday introduced two small language models, Mississippi 2B and Mississippi 0.8B, that are optimized for multimodal tasks such as extracting text from scanned documents.

The models are available on Hugging Face under an open-source license.

Mountain View, California-based H2O.ai provides a suite of tools for building artificial intelligence applications. Enterprises can use the company’s software to identify the open-source language model most suitable for an application project, customize that model and check the accuracy of its output. H2O.ai also provides features for related tasks such as implementing RAG features.

The first multimodal model that the company released this week, Mississippi 2B, features 2.1 billion parameters. It’s designed to analyze images based on natural language instructions provided by the user. Mississippi 2B can generate a high-level description of an image, elaborate on a specific detail highlighted by the user and explain data visualizations.

The model also lends itself to text extraction tasks. A company could, for example, use Mississippi 2B to extract purchase details from a scanned receipt and upload the information to a sales database. The AI can optionally package the extracted text into the JSON file format, which makes it easier to load information into applications.

Mississippi 0.8B, H2O.ai’s other new model, is a scaled-down version of Mississippi 2B with 800,000 parameters. It’s designed for many of the same tasks with a particular emphasis on text extraction. According to H20.ai, the algorithm outperforms all comparable small language models at optical character recognition tasks.

The company compared Mississippi 0.8B against the competition using a benchmark assessment that comprised 300 tasks. The evaluated models had to process logos, handwritten text, digits and other types of content. H20.ai says that its model outperformed not only comparably-sized algorithms but also open-source large language models with more than 20 times as many parameters.

Mississippi 2B and Mississippi 0.8B are based on the same architecture. When the algorithms are given an image to process, they divide it into tiles that measure 448 pixels by 448 pixels. From there, a component known as an encoder turns the tiles into embeddings, mathematical structures that AI models use to hold information. Those embeddings are then analyzed to answer user questions.

H2O.ai trained Mississippi 2B and Mississippi 0.8B in different ways. The former model’s training dataset included 17.2 million sample tasks that each comprised an image, a question about that image and an answer. Mississippi 0.8B, in turn, was developed using 19 million examples.

“We’ve designed H2OVL Mississippi models to be a high-performance yet cost-effective solution, bringing AI-powered OCR, visual understanding and Document AI to businesses,” said H2O.ai founder and Chief Executive Officer Sri Ambati.

H20.ai envisions developers deploying its new AI model series on devices with limited processing power. According to the company, the algorithms are also useful for latency-sensitive use cases. Thanks to their considerably lower parameter counts, small language models can respond to user queries significantly faster than frontier LLMs such as GPT-4o.

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+

CUBE Alumni Network

C-level and Technical

Domain Experts

15M+

theCUBE

Viewers

Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

H2O.ai releases small language models for multimodal processing tasks

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

Google Cloud Partner AI Series

Black Hat USA 2025

Open Storage Summit 2025

World of Workato 2025

VMware Explore 2025

RECENT CUBE EVENTS

AWS Mid-Year Leadership Summit 2025

RAISE Summit 2025

Blue Yonder AI and the Autonomous Supply Chain 2025

Data Protection & AI Summit 2025

Open Source Summit NA 2025

H2O.ai releases small language models for multimodal processing tasks

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST STORIES

LATEST STORIES

Google Cloud Partner AI Series

Black Hat USA 2025

Open Storage Summit 2025

World of Workato 2025

VMware Explore 2025

AWS Mid-Year Leadership Summit 2025

RAISE Summit 2025

Blue Yonder AI and the Autonomous Supply Chain 2025

Data Protection & AI Summit 2025

Open Source Summit NA 2025

Cookies