UPDATED 11:00 EDT / FEBRUARY 26 2025

AI

IBM debuts new Granite 3.2 family of models that include reasoning when you want it

Continuing its mission to carve out a niche in the enterprise artificial intelligence market, IBM Corp. today introduced a new family of its Granite AI models that include experimental reasoning capabilities, vision and forecasting capabilities.

As with every release, IBM is opening up its models under the permissive open-source Apache 2.0 license. All Granite models are now available on Hugging Face and select models are also available on IBM watson.ai and additional platforms.

The new family comes with its flagship text-only large language model Granite 3.2 Instruct variant in 8B and 2B versions. It can perform tasks such as summarization, problem solving and code generation and is designed to follow instructions. These types of models are best for building AI assistants and agents. Both have been trained to use “chain of thought” reasoning similar to other industry-standard models, however, with a twist that IBM engineers designed them to be smaller and more performant.

The reasoning capabilities in each model can also be turned on and off programmatically. That means that instead of releasing separate “reasoning models,” IBM created one that can be a conversational or a reasoning model. Since reasoning takes a tremendous amount of compute during deployment, turning it off during runtime when it’s not needed can save a lot of power.

“The next era of AI is about efficiency, integration and real-world impact – where enterprises can achieve powerful outcomes without excessive spend on compute,” said Sriram Raghavan (pictured), vice president of IBM AI research.

Reasoning models think through problems “step by step,” commonly known as “chain of thought” in the industry. The models have gained increasing popularity since the release of DeepSeek’s R1. Most reasoning models scan an entire reasoning space to discover the best logical “path” before generating a final answer. However, it’s not always necessary to follow an entire path once one is determined to be going bad.

IBM engineers developed a novel inference scaling technique that lowers the compute cost for reasoning tasks by adding a reward system by using a second process reward model. This reward model watches the LLM and redirects it to logical paths with higher confidence outcomes when reasoning. Combined with a search technique that can scan the entire logical space, the IBM researchers said, they were able to create a smaller, more efficient reasoning model approach compared wit R1, which does it all in one.

“DeepSeek’s R1 release was in many ways an acknowledgment of IBM’s smaller, high-efficiency model strategy,” said Dave Vellante, chief analyst at SiliconANGLE’s sister market research firm theCUBE Research. “IBM’s briefing reinforced this notion, pointing out that DeepSeek had used mixture-of-experts and other efficiency methods as early as December 2024 but gained little market attention until the recent R1 spotlight. We believe this echoes IBM’s approach to training efficiency and specialized architectures.”

IBM said Granite 3.2 8B can be tuned to rival even larger models such as Claude 3.5 Sonnet and OpenAI GPT-4o on math reasoning benchmarks such as the AIME2024 and MATH500 tests.

New multimodal vision model and smaller guardrail model

IBM also released a new multimodal Granite Vision 3.2 2B with computer vision capabilities trained to aid enterprise companies in handling visual document understanding.

Granite Vision can handle a wide variety of visual understanding tasks, but it is most relevant for documents. Although most VLMs are designed for vision tasks, not many of them are good at optical character or text recognition, IBM’s engineering team spent a great deal of time training Vision 3.2 on the unique visual characteristics of layouts, fonts, charts and infographics.

Granite Guardian 3.2 represents the newest of IBM’s guardrail AI models, which are designed to detect and highlight risks in prompts and responses. The company said it provides performance on par with 3.1 but faster and lower cost.

A benefit of Guardian 3.2 is that it provides “verbalized confidence,” when monitoring inputs and outputs that indicate confidence levels. Instead of indicating a binary “yes” or “no,” it relates a level of confidence as “high” or “low.” This provides developers a better indication if they can trust or reject the output, giving them a threshold to work with.

Alongside the updated 8B version, IBM also released two new model sizes. The first is a slimmed-down 5 billion-parameter version that retains performance close to the original. The second is Granite 3.2 3B-A800M, which was created by fine-tuning the Mixture-of-Experts base model. It operates by activating only 800 million of its 3 billion parameters at a time to deliver low cost for high performance.

The final model in IBM’s Granite family included the compact Granite Timeseries models, also known as Tiny Time Mixers. The newest addition, Granite-Timeseries-TTM-R2.1, expands the model capabilities to include daily and weekly forecasting across longer terms of up to two years. Timeseries models are useful for predicting trends at longer terms in industries such as finance, economics, supply chain demand forecasting and seasonal inventory planning in retail.

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU