UPDATED 13:00 EDT / DECEMBER 19 2024

Patronus AI releases Glider: a small, high-performance AI evaluator model for other models

Patronus AI Inc., a startup that builds tools for companies to detect and fix reliability issues in their large language artificial intelligence models, today announced the launch of a small but mighty AI model that can evaluate and judge the accuracy of much larger models.

The company calls its model Glider, a 3.8 billion parameter open-source LLM designed to be a fast, flexible judge for AI language models. The company said it’s the smallest model to date to outperform competing models such as OpenAI’s GPT-4o-mini, which is commonly used as an evaluator.

Large language model evaluation is the process of assessing how well an LLM performs particular tasks, such as text generation, comprehension and question answering by measuring accuracy, coherence and relevance against set standards. This helps AI developers and engineers understand and analyze how well the model will behave in given circumstances and identify its strengths and weaknesses before it is released to the public.

“Our new model challenges the assumption that only large-scale models (30B+ parameters) can deliver robust and explainable evaluations,” said Rebecca Qian, co-founder and chief technology officer of Patronus. “By demonstrating that smaller models can achieve similar results, we’re setting a new benchmark for the community.”

When AI engineers end up relying on proprietary LLMs such as GPT-4 to evaluate the performance of pre-trained LLMs, Patronus said, it comes with several issues, such as high cost and a lack of transparency. According to the company, Glider helps provide transparency to developers and engineers by delivering a small, explainable “LLM-as-a-judge” solution with real-time evaluation scores while walking through its reasoning.

Glider’s small size also means that it can be run on-premises or on-device, meaning that companies do not need to send their sensitive data to any third party. This is especially important during a time when more companies are becoming increasingly aware of the potential privacy implications of cloud-hosted models.

During evaluations, Glider provides high-quality reasoning chains in addition to benchmark scores for each of its criteria. It does this by providing understandable bullet-point lists that explain its process. As a result, each score comes with a reason “why,” allowing developers to understand the context and full breadth that underlies what caught the model’s attention.

The company said the model is trained on 183 real-world evaluation criteria across 685 domains, which enables it to handle the evaluation of tasks that require factual accuracy and subjective human-like metrics. These include evaluation criteria such as fluency and coherence, which makes the model versatile across creative and business applications.

Its judgment system evaluates not just model outputs, but also user inputs, context, metadata and more.

“By combining speed, versatility and explainability with an open-source approach, we’re enabling organizations to deploy powerful guardrail systems without sacrificing cost-efficiency or privacy,” said co-founder and Chief Executive Anand Kannappan. “It’s a significant contribution to the AI community, proving that smaller models can drive big innovations.”

Patronus said that by providing an open-source model that supports on-premises deployment, Glider can be used for multiple evaluation use cases, including acting as an LLM guardrail, which can evaluate and catch bad behavior, or provide real-time subjective text analysis.

Image: SiliconANGLE/Microsoft Designer

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Patronus AI releases Glider: a small, high-performance AI evaluator model for other models

Image: SiliconANGLE/Microsoft Designer

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Patronus AI releases Glider: a small, high-performance AI evaluator model for other models

Image: SiliconANGLE/Microsoft Designer

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Cookies