Faster and More Accurate Alert Adjudication Using LASAA
Created
As part of the project “Using LLMs to Adjudicate Static-Analysis Alerts” (“LASAA”), we at the SEI designed, implemented, and evaluated LLM-based techniques for adjudicating static-analysis alerts quickly and accurately. After analyzing the results of our evaluation, we concluded that this approach shows great promise.
The Difficulty of Using Static Analysis to Evaluate Source Code
Software vulnerabilities pose a significant risk to critical systems. During development and prior to deployment, software developers and analysts use static analysis to evaluate source code for potential vulnerabilities. Static analysis is widely used and is one of the best techniques available for adjudicating static analysis results. However, it is time-consuming and expensive because it requires significant manual effort, and the volume of findings is often too large to review in its entirety. The result is that software analysts manually adjudicate only the highest priority alerts, which leaves the rest as a significant unknown risk.
Large Language Models (LLMs) are a technology that shows promise for automating alert adjudication. Although older ML techniques lack easy interpretability and often pivot on irrelevant details that merely correlate with code flaws in their training data, newer LLMs are different. These new LLMs produce detailed reasoning that they use to reach their final answers, and these reasoning chains can be manually double-checked. Sometimes these reasoning chains reveal that the LLM initially made a mistake in its analysis but was able to detect and recover from the mistake, eventually reaching the correct answer.
LASAA Is Faster and More Accurate
We are exploring and have developed multiple techniques for using LLMs to handle static analysis output. Recent research indicates that LLMs, especially reasoning models such as o3, o4-mini, and all current frontier models, represent a significant step forward in automated static-analysis adjudication. In one study, researchers could use LLMs to identify more than 250 types of vulnerabilities and reduce the number of those vulnerabilities by 90 percent.
By studying the capabilities of the newer LLMs, we identified ways we can tool LLMs to generate better results. For example, to handle alerts whose adjudication requires analyzing multiple functions spread across the codebase, we leverage LLMs’ ability to generate function preconditions and to check preconditions at callsites. We also found that LLMs perform much better when they are asked to adjudicate a particular issue on a particular line of code rather than prompted to find all the errors in a function. Based on these and other findings, we developed an approach for using LLMs to adjudicate static analysis alerts that we implemented in our LASAA tool.
We developed LLM initial tooling, tested that tooling, and studied the results. We analyzed related work by other researchers and prepared for the direction we would take to further improve our exploration of this topic. LASAA enables more complete alert adjudication, thereby reducing unknown risk and enabling the removal of vulnerabilities before software is fielded. Some LASAA techniques are based on our observation that LLMs rarely answer static-analysis adjudication questions consistently wrong. Instead, when the query is run multiple times, an LLM is likely to either consistently deliver the right answer or deliver inconsistent answers.
How LASAA Works
We tested LASAA on multiple popular LLMs, including some that can be run on-premises (a possible requirement for classified content) and others that run off-site only. We were able to demonstrate that our LLM-based techniques automatically adjudicated a large percentage of alerts with high accuracy on randomly selected sets from three test suites: Juliet C/C++ v1.3, FormAI, and SVCOMP benchmarks. We also tested our techniques on multiple real-world codebases used as modules for ground and space systems, including NASA AMMOS’ Multi-Mission Time Correlation (MMTC) Java code and NASA’s Core Flight System (cFS).
These demonstrations have shown that LASAA potentially enables more secure code, supports mission effectiveness, and reduces support costs.
Looking Ahead
Looking to the future, LLMs can be used to improve the formal verification of software, an area that currently requires a huge amount of manual effort. Generating and proving loop invariants and function pre-/post-conditions is a crucial and challenging part of formal verification, and LLMs appear promising for helping with this task as well.
Learn More
LLMs to Adjudicate Static Analysis Alerts (LASAA) Assets
•Collection
This collection contains assets related to the LLMs to Adjudicate Static Analysis Alerts (LASAA) project.
Learn MoreLLMs to Adjudicate Static Analysis Alerts (LASAA)
•Fact Sheet
This fact sheet describes the LASAA project which uses large language models (LLMs) to adjudicate static analysis alerts. This enables more complete alert adjudication, reducing unknown risk and improving software security.
Learn MoreSecure Code Faster at Lower Cost for Ground and Space Systems: Techniques for High-Accuracy Static-Analysis Adjudication using LLMs
•Presentation
Will Klieber and Lori Flynn presented this session at the Ground System Architectures Workshop on Tuesday, February 24, 2026.
Learn MoreAutomated Techniques for Ground Systems Software Security
•Poster
Will Klieber and Lori Flynn presented this poster at the Ground System Architectures Workshop on Tuesday, February 24, 2026.
DownloadUsing Popular LLMs for Static Analysis Alert Adjudication: For the 2025 DoW AI/ML Technical Exchange Meeting
•Presentation
On January 15, 2026, Lori Flynn and Will Klieber presented this session at the Department of War (DoW) Artificial Intelligence/Machine Learning (AI/ML) Technical Exchange Meeting, in the Security and Safety track. They discussed work developed in the Line-funded research project “Using LLMs to Adjudicate Static-Analysis Results.”
Learn MoreUsing LLMs to Adjudicate Static-Analysis Alerts
•Conference Paper
This paper discusses techniques for using large language models to handle static analysis output.
ReadEvaluating Static Analysis Alerts with LLMs
•Blog Post
LLMs show promising initial results in adjudicating static analysis alerts, offering possibilities for better vulnerability detection. This post discusses initial experiments using GPT-4 to evaluate static analysis alerts.
READUsing LLMs to Automate Static-Analysis Adjudication and Rationales
•Article
This article discusses a model for using large language models (LLMs) to handle static analysis output.
Read