A new Cisco AI Threat Research report shows leading closed-source large language models are vulnerable to multi-turn attacks.
The study tested 15 proprietary models. These systems came from developers including OpenAI, Google, and Anthropic.
Adversarial success rates increased significantly during conversational, multi-prompt attacks compared to standard single-prompt tests.
Iterative attacks increased model vulnerability by several multiples.
One tested model showed an 18.1% failure rate in single-turn scenarios. This failure rate rose to 73.4% in multi-turn attacks.
Cisco suggests that current safety benchmarks understate actual threats. The security perimeter must extend beyond the models themselves.