UPDATED 17:14 EDT / FEBRUARY 27 2025

CalypsoAI launches cybersecurity leaderboard for AI models

Startup CalypsoAI Inc. on Wednesday launched the CalypsoAI Security Leaderboard, an index that ranks the cybersecurity of popular artificial intelligence models.

The company ranked the algorithms using its flagship product, a software toolkit called the Inference Platform. It evaluates models’ security with the help of an AI agent that carries out simulated cyberattacks.

Ireland-based CalypsoAI is backed by more than $38 million in funding. Its Inference Platform enables companies to monitor how users interact with their large language models, spot malicious prompts and filter them. The CalypsoAI Security Leaderboard was created with a component of the platform called Red-Team that simulates malicious prompts to find weak points in LLMs.

According to CalypsoAI, Red-Team includes a library of more than 10,000 prompts designed to uncover model vulnerabilities. There’s also an AI agent that can generate simulated cyberattacks tailored to a specific LLM. If the agent is given the task of testing a bank’s customer support chatbot, it might attempt to trick the algorithm into disclosing credit card numbers.

Red-Team distills its cybersecurity findings into what CalypsoAI calls a CASI score. The higher a model’s CASI score, the better its security.

CalypsoAI positions CASI as a better alternative to ASR, a metric commonly used to measure LLM security. According to the company, ASR falls short because it doesn’t take into account the severity of model vulnerabilities. Two LLMs might have the same ASR score even if one leaks information from its training dataset, while the other is only susceptible to malicious prompts that cause brief latency spikes.

The CASI metric takes into account the severity of LLM vulnerabilities. It also considers other factors including the technical sophistication of the cyberattacks to which a model is susceptible and the amount of hardware needed to carry them out.

The initial version of the CalypsoAI Security Leaderboard ranks a dozen popular LLMs. Claude 3.5 Sonnet, one of Anthropic PBC’s most advanced language models, won the top spot with a CASI score of 96.25. Microsoft Corp.’s open-source Phi4-14B and Claude 3.5 Haiku followed suit with 94.25 and 93.45, respectively.

CalypsoAI observed a sharp dropoff below the top three. The fourth most secure LLM the company evaluated, OpenAI’s GPT-4o, achieved a CASI score of 75.06. All but one of the eight other models ranked in the index achieved scores above 72.

Besides CASI, CalypsoAI’s leaderboard also tracks two other LLM metrics. The first, which is known as the risk-to-performance ratio, is designed to help companies understand tradeoffs between model security and performance. A second metric called cost of security makes it easier to evaluate the potential financial impact of an LLM-related breach.

“Our Inference Red-Team product has successfully broken all the world-class GenAI models that exist today,” said CalypsoAI Chief Executive Officer Donnchadh Casey. “Many organizations are adopting AI without understanding the risks to their business and clients; moving forward, the CalypsoAI Security Leaderboard provides a benchmark for business and technology leaders to integrate AI safely and at scale.”

SiliconANGLE Executive Editor John Furrier interviewed Calypso Chief Technology Officer James White this week. Here’s the full video:

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU

CalypsoAI launches cybersecurity leaderboard for AI models

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

Architecting Outcomes in the Era of Intelligence

ServiceNow Knowledge 2025

Boomi World 2025

Qlik Connect 2025

IBM Think 2025

RECENT CUBE EVENTS

RSAC Conference 2025

The ART of Security Summit: Strategic Risk Management for CISOs 2025

From Vision to Reality: How AI Is Transforming Work Now 2025

AI Agent Builder Summit 2025

Google Cloud Next 2025

CalypsoAI launches cybersecurity leaderboard for AI models

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

LATEST STORIES

LATEST STORIES

Architecting Outcomes in the Era of Intelligence

ServiceNow Knowledge 2025

Boomi World 2025

Qlik Connect 2025

IBM Think 2025

RSAC Conference 2025

The ART of Security Summit: Strategic Risk Management for CISOs 2025

From Vision to Reality: How AI Is Transforming Work Now 2025

AI Agent Builder Summit 2025

Google Cloud Next 2025

Cookies