AI
AI
AI
A new report out today from Cisco Systems Inc. argues that none of the closed flagship large language models it tested can be considered safe once an attacker is allowed to push past a single prompt, as adversarial success rates climb sharply across every model in the cohort.
The Cisco AI Threat Research team measured 15 proprietary models from OpenAI Group PBC, Anthropic PBC, Google LLC, Amazon.com Inc. and xAI Corp., putting multi-turn attack success rates between 7.9% and 88.3% across the cohort, against single-turn rates of 2.2% to 64.9% on the same models.
The two regimes did not produce the same model ordering and models that looked strong on the single-turn benchmarks used in model cards and procurement reviews did not necessarily hold up when an attacker could keep talking.
The work is a follow-up to “Death by a Thousand Prompts,” Cisco’s earlier assessment of eight open-weight models, which found multi-turn success rates two to 10 times higher than single-turn baselines and topped out at 92.78% against Mistral AI SAS’ Mistral Large-2. The new study extends the same pattern into the closed, proprietary frontier.
The widest gaps came from xAI’s Grok 4.1 Fast in its non-reasoning configuration, which moved from 34.2% single-turn to 88.3% multi-turn and Google’s Gemini 3 Pro, which rose from 18.1% to 73.4%. OpenAI’s GPT-5.4 climbed from 2.7% to 24.7%, a roughly nine-times increase. Anthropic’s Claude family showed the narrowest gaps, with Claude Opus 4.5 moving from 2.19% to 11.2% and Claude Opus 4.6 from 3.6% to 16.2%.
Amazon’s Nova 2 Lite produced the cleanest inversion in the cohort with a relatively high single-turn rate of 34.1% but the lowest multi-turn rate at 7.9%. The Cisco researchers noted that the result illustrates why single-turn scores alone cannot be treated as a proxy for adversarial robustness.
The evaluation drew on 30,090 single-turn prompts and 6,986 multi-turn attacks distributed across 1,456 conversations, all run through the same harness and scored under the Cisco Integrated AI Security and Safety Framework taxonomy. Strategy families covered role-play and persona adoption, contextual ambiguity, refusal reframing, information decomposition and reassembly and crescendo-style incremental escalation.
A second finding concerned deployment-time configuration. The same Grok 4.1 Fast model dropped from an 88.3% multi-turn attack success rate to 43.5% once the reasoning mode was enabled, a swing the report says is not captured by any public benchmark or model card the researchers reviewed.
Cisco called on model providers to document the safety effects of configuration flags such as reasoning modes, system-prompt adherence settings, temperature and guardrail tiers alongside the capability benchmarks they already publish.
The researchers also identified concentrations of failure on the single-turn side. “Imposter AI” procedures produced a weighted attack success rate of 37.5%, followed by soft paraphrase attacks at 29.2% and system-prompt attacks at 27.7%. On the content side, hate speech, profanity and specialized advice categories dominated.
The report sets out three recommendations for organizations buying or deploying frontier models: Ask labs to publish attack success rates broken down by strategy family on every model release, gate deployments on regressions in the top procedures and content categories with a three-percentage-point threshold, and flag any model with a cross-regime gap larger than 15 percentage points for manual review.
In the tested cohort, that last rule alone would surface eight of 15 models, including GPT-5.4, Gemini 3 Pro, both Grok configurations and all three Nova variants.
The findings also carry a compliance edge. NIST’s AI Risk Management Framework, its draft Cyber AI Profile and Article 15 of the European Union AI Act all require adversarial robustness testing, without saying how many turns it has to cover or which attack strategies should be in scope. The Cisco numbers suggest the single-turn scores most labs publish today would not be enough to satisfy any of those frameworks on a strict reading.
“If no base model is iteratively safe, the security perimeter has to move outside the model,” the report’s authors wrote, pointing to runtime guardrails, monitoring, red-teaming and application-layer policies. The findings are designed to inform Cisco’s own AI Defense product and the Cisco LLM Security Leaderboard, which publishes adversarial evaluation signals against leading models.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.