

Startup CalypsoAI Inc. on Wednesday launched the CalypsoAI Security Leaderboard, an index that ranks the cybersecurity of popular artificial intelligence models.
The company ranked the algorithms using its flagship product, a software toolkit called the Inference Platform. It evaluates models’ security with the help of an AI agent that carries out simulated cyberattacks.
Ireland-based CalypsoAI is backed by more than $38 million in funding. Its Inference Platform enables companies to monitor how users interact with their large language models, spot malicious prompts and filter them. The CalypsoAI Security Leaderboard was created with a component of the platform called Red-Team that simulates malicious prompts to find weak points in LLMs.
According to CalypsoAI, Red-Team includes a library of more than 10,000 prompts designed to uncover model vulnerabilities. There’s also an AI agent that can generate simulated cyberattacks tailored to a specific LLM. If the agent is given the task of testing a bank’s customer support chatbot, it might attempt to trick the algorithm into disclosing credit card numbers.
Red-Team distills its cybersecurity findings into what CalypsoAI calls a CASI score. The higher a model’s CASI score, the better its security.
CalypsoAI positions CASI as a better alternative to ASR, a metric commonly used to measure LLM security. According to the company, ASR falls short because it doesn’t take into account the severity of model vulnerabilities. Two LLMs might have the same ASR score even if one leaks information from its training dataset, while the other is only susceptible to malicious prompts that cause brief latency spikes.
The CASI metric takes into account the severity of LLM vulnerabilities. It also considers other factors including the technical sophistication of the cyberattacks to which a model is susceptible and the amount of hardware needed to carry them out.
The initial version of the CalypsoAI Security Leaderboard ranks a dozen popular LLMs. Claude 3.5 Sonnet, one of Anthropic PBC’s most advanced language models, won the top spot with a CASI score of 96.25. Microsoft Corp.’s open-source Phi4-14B and Claude 3.5 Haiku followed suit with 94.25 and 93.45, respectively.
CalypsoAI observed a sharp dropoff below the top three. The fourth most secure LLM the company evaluated, OpenAI’s GPT-4o, achieved a CASI score of 75.06. All but one of the eight other models ranked in the index achieved scores above 72.
Besides CASI, CalypsoAI’s leaderboard also tracks two other LLM metrics. The first, which is known as the risk-to-performance ratio, is designed to help companies understand tradeoffs between model security and performance. A second metric called cost of security makes it easier to evaluate the potential financial impact of an LLM-related breach.
“Our Inference Red-Team product has successfully broken all the world-class GenAI models that exist today,” said CalypsoAI Chief Executive Officer Donnchadh Casey. “Many organizations are adopting AI without understanding the risks to their business and clients; moving forward, the CalypsoAI Security Leaderboard provides a benchmark for business and technology leaders to integrate AI safely and at scale.”
SiliconANGLE Executive Editor John Furrier interviewed Calypso Chief Technology Officer James White this week. Here’s the full video:
THANK YOU