2024 Year in Review
Evaluating Risk Mitigation Practices for Generative AI in High-Sensitivity Domains
Generative artificial intelligence (AI) applications powered by large language models (LLMs) promise efficiency and efficacy for data-heavy enterprises, but the technology is particularly risky for the intelligence community. In 2024, the SEI researched three areas of generative AI called out by the Office of the Director of National Intelligence’s Principles of Artificial Intelligence Ethics for the Intelligence Community: protecting privacy, mitigating bias, and immature security practices.
Benchmarks for Machine Unlearning
Removing sensitive data from a trained machine learning (ML) model normally requires expensive retraining. Machine unlearning algorithms are an efficient alternative for making neural networks forget specified data. This technique would allow organizations to address, in a scalable way, concerns about sensitive data exposure and privacy breaches.
However, research led by the SEI’s Keltin Grimes suggests it is difficult to know if machine unlearning is successful. Current evaluation methods test unlearned models against relatively weak membership inference attacks and do not consider model update leakage, in which attackers infer deleted data by comparing model behavior before and after unlearning. The methods also do not account for declining model accuracy over iterations of unlearning.
For machine unlearning to help ML scale without expanding data exposure and privacy risks, practitioners will need ways to determine the success of the practice. Grimes and his colleagues plan to develop a comprehensive framework of machine unlearning evaluation benchmarks.
Scenarios for Auditing Bias in LLMs
While generative AI chatbots such as ChatGPT have safeguards against offensive outputs, they are imperfect, and biases within the underlying models remain. Auditing an LLM’s inherent biases could be important in consequential settings such as intelligence analysis.
To uncover model bias, SEI research led by Katie Robinson and Violet Turri tested persona- and scenario-driven interactions with ChatGPT that circumvented the system’s guardrails. They crafted a cowboy persona, had ChatGPT iterate a role-playing scenario as the cowboy, and prompted the persona to describe characters with a diverse set of names. The roles and personality traits ChatGPT provided revealed model stereotypes related to ethnic background and gender that were absent without the persona and role-playing scenario.
This work reveals that LLM bias can yield potentially harmful misrepresentations despite system safeguards. It also indicates the utility of exploratory methods of identifying bias and suggests paths for future investigation.
Red-Teaming Guidelines for Generative AI
In AI red-teaming, testers emulate the exploitation of AI systems to find flaws and vulnerabilities. Faced with the broad risk surface of ML models, many champion red-teaming as a powerful way to mitigate safety, security, and trustworthiness concerns. However, the practice is poorly defined and its efficacy poorly understood. To determine the robustness of AI red-teaming, Carnegie Mellon University researchers led by the SEI’s Michael Feffer and Anusha Sinha conducted an award-winning literature survey to characterize the practice’s current scope, structure, and criteria.
They found that while red-teaming can identify risks and help evaluate the safety and robustness of generative models, it is not a comprehensive method and cannot guarantee safety. The practice is also inconsistently scoped and structured, with no consensus on evaluation team composition, threat models, resources, risks considered, or reporting and mitigation processes. The researchers proposed essential criteria to guide more effective AI red-teaming practices.
The New Edge of Generative AI
The edge of the field of generative AI is shifting to the tools and techniques for evaluating and mitigating the technology’s risks. The three pillars of AI Engineering—scalable, operator-centered, and robust and secure—informed SEI research into these techniques to further ensure the benefits of AI and LLMs outweigh any risks for high-sensitivity, high-consequence domains.
Researchers
Collin Abidi, Michael Feffer, Cole Frank, Shannon Gallagher, Keltin Grimes, Katie Robinson, Anusha Sinha, Carol Smith, Violet Turri
Mentioned in this Article
Principles of Artificial Intelligence Ethics for the Intelligence Community
Gone but Not Forgotten: Improved Benchmarks for Machine Unlearning
Tales from the Wild West: Crafting Scenarios to Audit Bias in LLMs
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Topics
Artificial Intelligence EngineeringMore on Artificial Intelligence Engineering from the 2024 Year in Review
Training the DoD to Leverage AI for Strategic Advantage
The SEI has created and delivered more than 150 hours of training materials to fill the gap on AI Engineering for the Department of Defense.
READ MORE