AI Red-Teaming Gets a Large-Language-Model Upgrade

Red-teaming competitions, a staple of cybersecurity workforce development, are starting to encompass generative artificial intelligence (AI). Manually evaluating AI exploits is time consuming and hard to reproduce, leading to uneven scoring across events. To improve the state of the practice of AI red-teaming and the accuracy of AI system defenders, the SEI developed a software framework to automate evaluation of AI exploits for capture-the-flag (CTF) competitions. The auto-grader also produces critical data points for future CTF development and real AI exploit detection.

New Attacks

According to the Cybersecurity and Infrastructure Security Agency (CISA), “AI red teaming is a foundational component of the safety and security evaluations process.” In the past few years, the technology community’s premier ethical hacking conferences have introduced generative-AI CTF exercises to raise the skill level of cyber defenders.

The novelty and large attack surface of generative AI systems have led to new and nuanced red-team attacks, making judging these competitions challenging. The introduction of bias, misinformation, and disinformation into a large language model (LLM) is particularly hard to detect. In a typical AI CTF, human evaluators spend hours reviewing each competitor’s exploit, looking for model injections, persuasion, and model failure states. The process for extracting this valuable data is labor intensive and hard to reproduce for future competitions.

Faster, Fairer Scoring

CISA tasked the SEI with automating the scoring of AI CTF exploits. SEI experts in cybersecurity engineering, cyber competitions, and AI security teamed with CISA and Carnegie Mellon University’s Beibei Li, professor of information technology and management. Together they implemented existing AI red-teaming principles, such as those from MITRE’s ATT&CK knowledge base, Google’s Secure AI Framework (SAIF), and the National Institute of Standards and Technology (NIST) AI Risk Management Framework, in software for the first time.

The automatic grader takes a full user interaction with an LLM—every prompt and output from start to finish—and runs it through another, custom LLM. The grader evaluates popular prompt injection attacks, persuasion or role-playing attacks, and guardrail failures or attacker-induced hallucinations. It can also account for more nuanced attacks involving model bias and misinformation. In just seconds, the auto-grader outputs a grade for the attacker–model conversation and, as controls, grades for five closely related malicious conversations from a retrieval-augmented generation (RAG) database.

With this information, CTF organizers can quickly and fairly compare the exploits to determine rankings. The SEI also developed challenges and exploit solutions, such as adversarial attacks and noise injections, for 10 red-teaming CTFs as examples for use of the auto-grader. CISA will release the open source auto-grader and challenges on its GitHub site.

Future Competitions, Attack Detection, and Beyond

CISA and the SEI invite users of the auto-grader to extend it into new AI red-teaming use cases, inside and outside of competitions, and evaluate new threats. Novel exploits can be cycled back into the tool’s LLM to improve future evaluations and build more robust datasets for AI red-teaming.

The grader’s software framework could contribute to research in automated AI red-teaming defenses and AI security evaluation. The data it collects could also help define AI security mitigations, train more secure LLMs, and improve machine-learning-based detectors of real-world attacks.

For now, CISA encourages CTF organizers to use the auto-grader in their own events. As AI red-teaming competitions improve, so too will the skills of human AI security experts and the overall security posture.

Cyber competitions, such as capture-the-flag, are a fundamental element of developing and motivating the current and future generation of cybersecurity talent.

Chris Butera

Acting Executive Assistant Director for Cybersecurity, CISA

“The Cybersecurity and Infrastructure Security Agency has a vital role in preparing the nation's cyber defenders to address the continually evolving threats to our critical infrastructure information systems. Cyber competitions, such as capture-the-flag, are a fundamental element of developing and motivating the current and future generation of cybersecurity talent,” said acting executive assistant director for cybersecurity Chris Butera. “When released, the open source capture-the-flag auto-grader, developed through CISA’s relationship with the Software Engineering Institute, will enable organizations to incorporate AI-focused scenarios into their cybersecurity training and evaluation programs.”