Automated Code Repair
Created September 2017
Finding security flaws in source code is daunting; fixing them is an even greater challenge. We are creating automated tools that can repair bugs automatically or that prompt developers for more information to make effective repairs.
Vast Amounts of Code Have Many Security Vulnerabilities
CERT Division Source Code Analysis Laboratory (SCALe) reviews of software from the U.S. Department of Defense (DoD) and other sources show that most software contains many vulnerabilities. Most security flaws are caused by simple coding errors. Static analysis tools, typically used late in the development process, produce a huge number of diagnostics. Even after excluding false positives, the volume of true positives can overwhelm the abilities of development teams to fix the code. Consequently, the team eliminates only a small percentage of the vulnerabilities. Meanwhile, the existing installed codebases in the DoD now consist of billions of lines of C code that contain an unknown number of security vulnerabilities.
Most analyzers provide basic diagnostics but do not provide automated fixes or code modifications. Integrated development environments (IDEs), such as Eclipse, offer some automated code modification. Some IDEs fix code that has specific compilation errors, such as Quick Fixes in Eclipse. While IDEs provide some refactoring options, they are not intended to change the behavior of the code; instead they improve some aspect of the design.
Existing techniques for addressing security problems in code often require programmers to add more information—such as annotations and attributes—that can then be post-processed. These techniques are effective when developing new code, but they have the same practical limitations that manually address thousands of diagnostics in existing programs. We need a better way to fix existing code.
Our CERT Secure Coding team members are engaging DoD Software Assurance Community of Practice members. We have engaged with CERDEC to provide feedback and technology transition. Specifically, CERDEC will evaluate the integer-overflow repair tool on DoD codebases.
Our Solution: Automated Tools Look for Vulnerabilities and Fix Them
Our experience examining code shows that many security-relevant bugs follow common patterns that tools can automatically detect. There are corresponding patterns for repairing these bugs that tools can perform using automatic program transformation. We are developing automated source-code transformation tools to remediate vulnerabilities in code that are caused by violations of rules in the CERT Secure Coding Standards.
These tools convert noncompliant code into code that complies with the CERT standards. They reduce vulnerabilities without the need for developers to manually review thousands of diagnostics produced by static analysis tools. Sometimes our tools repair a bug completely automatically. In other cases, it prompts developers for more information when a little manual intervention can result in an effective repair.
We based our automated repair work on three premises:
- Many security bugs follow common patterns.
- By recognizing a pattern, a tool can make a reasonable guess about the developer's intention. We call this the inferred specification.
- A tool can repair the code to satisfy the inferred specification.
For example, malloc is a function that allocates a chunk of memory and returns a pointer to it. One common pattern of security bugs is a memory allocation such as “p = malloc(n * sizeof(T)),” where n is attacker-controlled. If n is too large, integer overflow occurs, and too little memory gets allocated, setting the stage for a buffer overflow. The inferred specification in the malloc case would be “Try to allocate enough memory to hold n objects of type T.” The tool inserts code to check whether overflow occurs and to simulate malloc returning NULL due to insufficient memory if overflow does occur.
To develop our automated code repair tool, we extended Rose, a framework for source code transformation. Our goal is to reduce the number of rule violations that require manual inspection by two orders of magnitude—from thousands to tens. At this scope, a development team can mitigate all unhandled violations. Automated code repair reduces a system’s attack surface and improves its ability to withstand cyber attacks while sustaining critical functions.
November 07, 2021 Presentation
This research highlight how to increase software assurance of binary components by analyzing and repairing functions.read
November 04, 2021 Video
This short video provides an introduction to a research topic presented at the SEI Research Review 2021.watch
June 01, 2021 Presentation
In this presentation, the authors discuss a technique for repairing C code to protect against potential violations of spatial memory safety.read
November 11, 2019 Video
Watch SEI principal investigator Dr. Will Klieber discuss research to design and implement a technique to automatically repair all potential violations of memory safety in the source code so that the program is provably memory-safe.watch
October 28, 2019 Poster
This is a poster reflecting research to automatically repair C source code to eliminate memory-safety vulnerabilities.read
June 05, 2016 Blog Post
In 2015, the National Vulnerability Database (NVD) recorded 6,488 new software vulnerabilities, and the NVD documents a total of 74,885 software vulnerabilities discovered between 1988-2016. Static analysis tools examine code for flaws, including those that could lead to software security...read