Automation Speeds Up Quality Assessments for Safety-Critical Software

Software quality assessments identify risk, allowing for mitigation strategies that ensure secure code in safety-critical defense platforms. But standard static analysis tools can miss risks or vulnerabilities hidden in complex systems. Manual assessment is often time-intensive, error prone, and inconsistent, presenting a hard-to-scale bottleneck to software delivery.

Using commercially available tools, custom analyzers, and public large language models (LLMs), the SEI developed new tools that automate structural code quality assessment in classified and unclassified environments. New on-site capabilities also allow the SEI to conduct software analysis in classified, secure settings.

Accelerating Verification and Software Standards Testing

Software quality analysis can be especially challenging in classified settings. Not only do typical static analysis tools lack the technical depth needed, but their predefined rulesets often do not align with classified security policies. Generative artificial intelligence (AI) platforms promise quick tool creation. But classified government programs have connectivity and confidentiality constraints that restrict them from accessing best-in-class, public LLMs.

In 2025, SEI experts demonstrated that they could use public generative AI platforms to rapidly develop static analysis tools that accelerate software analysis in classified environments, while keeping restricted code out of public databases.

SEI researchers input public material on software verification tasks into public LLMs. With careful prompt engineering, the researchers used the LLMs to support tool design, development, and implementation. The project created tool-creation workflows for both tool developers and software analysts. After expert evaluation, the tools were migrated to a classified environment and tested against two difficult tasks.

When doing critical signal analysis, analysts reduced their time spent by a median of 39 percent when they used an LLM-generated data visualization tool, FlowFusion, versus manual processes. Their accuracy also increased by a median of 10 percent.

To address the other task, government software requirements validation, the SEI used LLMs to develop the Accelerating Verification and Software Standards Testing (AVASST) Plugin Suite. AVASST could provide a common platform for sharing custom LLM-generated plugins, enabling quick, repeatable analysis checks across software versions. By implementing the platform in a DevSecOps pipeline, AVASST could enable broad evaluation of code quality by automating checks within the development process.

The SEI researchers emphasize that the tests were small scale, and the tools should not replace human analysis. However, the results showed that LLMs can enhance productivity in software analysis, even in classified environments.

Automating Code Risk Estimation

The SEI is also automating its Code Risk Estimation Worksheet (CREW), a 480-point manual assessment of structural code quality. CREW Light automatically aggregates the output from best-in-class commercially available static analysis, architectural analysis, and organically developed analytic tools into a single spreadsheet. The sheet estimates software quality based on five key areas of risk identified within the source code: security, modifiability, clarity, reusability, and adherence to standards.

CREW Light integrates into DevSecOps continuous integration/continuous deployment (CI/CD) pipelines. It aims to shorten risk estimation time from months of manual labor to just hours.

We have the experience in areas like coding standards, technical debt, and acquisition to help programs with cradle-to-grave software challenges.

Michael Riley

Resilient Critical Systems Team Lead, SEI Software Solutions Division

Trusted Assessments for Sensitive Programs

Government stakeholders, such as NASA and Department of War components, and their industry partners trust the SEI to perform independent technical assessments of safety-critical, software-intensive systems.

“We have the experience in areas like coding standards, technical debt, and acquisition to help programs with cradle-to-grave software challenges,” said Michael Riley, the SEI’s Resilient Critical Systems team lead.

Principal Investigator

Yash Hindka, Ryan Karl (AVASST)

Alan Cohn, Michael Riley (CREW)

Researchers

John Robert, Shen Zhang (AVASST)

Mentioned in this Article

An Approach to Accelerate Verification and Software Standards Testing with LLMs

Accelerating Verification and Software Standards Testing (AVASST) with Large Language Models (LLMs)

Assessing Software Quality Using a Risk-Based Methodology

<< Previous

Supporting DARPA’s AI Reinforcements: Tactical Autonomy AI for Future Air Combat

SEI infrastructure and software are enabling development and evaluation of AI-driven tactical autonomy solutions for multi-ship, beyond-visual-range air combat.

Next >>

Efficacy of Human Teaming with Generative AI for Software Maintenance

An SEI study examined how early-stage developers use AI tools for complex code generation tasks.