Using Automation to Prioritize Alerts from Static Analysis Tools
Created September 2017
Validating and repairing defects discovered by static analysis tools can require more human effort from auditors and coders than organizations have. CERT researchers are developing a method to automatically classify and prioritize alerts to help auditors and coders address large volumes of alerts with less effort.
The Static Analysis Challenge: Sorting the Problems You Have from the Problems You Don’t Have
Federal agencies and other organizations face an overwhelming number of security challenges in their software. Static analysis (SA) tools attempt to automatically identify defects in software products, including those that could lead to security vulnerabilities. These tools define a set of conditions for a well-behaved program.
SA tools analyze the program to find violations of those conditions by examining possible data flows and control flows, without executing the program. Then they produce diagnostic messages, or alerts, about purported flaws in the source code. According to organizational priorities, human auditors then evaluate the validity of these alerts and repair confirmed flaws.
SA tool producers improve the methods their tools use to check for code flaws, devising new algorithms that analyze faster, use less memory or disk space, work more precisely, or find more true positives. Tool producers also work to increase their coverage of code flaw taxonomies such as CWE, SEI CERT Coding Standards, and MISRA C.
As software assurance tools identify more kinds of code flaws, more true flaws and false positives are reported as alerts. SA tools also exhibit false negatives, meaning they sometimes do not produce a warning when a true code flaw exists. Development organizations attempt to address this problem by running multiple SA tools on each codebase to increase the types of code flaws they can find. However, this approach compounds the problem of having too many alerts—both true and false positives.
For most large codebases, using just one or two general SA tools generates too many alerts for a team to address within a project’s budget and schedule. Auditors and coders urgently need an automated method to classify true- and false-positive alerts. They also need automated support to organize and prioritize alerts from SA tools to manually evaluate and address them effectively and efficiently.
Three DoD organizations have agreed to provide sanitized alert audit data to support our research. We also work with collaborators at MITRE on mappings between SEI CERT Coding Standards and Common Weakness Enumerations (CWEs). Dr. Claire Le Goues, from the Carnegie Mellon University Computer Science Department, serves as an advisor with expertise on assuring high-quality software systems.
Our Automated Solution: Classifying Alerts and Prioritizing Them for Action
We began with our audit archives from previous static analyses of 20 codebases by the CERT Source Code Analysis Laboratory (SCALe) code conformance service. The SCALe tool uses multiple commercial, open-source, and experimental SA tools to analyze codebases for potential flaws. By using multiple tools, SCALe finds more detected code defects than any single SA tool would find. To expand our data set, three DoD collaborators provided sanitized audit data from their own codebases, analyzed by an enhanced research prototype version of SCALe.
Our solution fuses alerts from different tools for the same code flaw at the same location. Fusion requires mapping alerts from different tools to a code flaw taxonomy; in 2016, we used SEI CERT Coding Standards. The script we created performs fusion and performs additional analysis by counting alerts per file, alerts per function, and the depth of affected files within the code.
CERT researchers use the results of this analysis as “features” for the classifiers. Features are types of data that are analyzed by the mathematical algorithms for the classifiers. They include data gathered by code metrics tools and general SA tools about the program, file, function, and other categories relevant to each alert. These features help us develop more accurate classifiers.
We classify alerts into one of three categories:
- expected true positive (e-TP)
- expected false positive (e-FP)
- indeterminate (I)
We assign membership to one of these classes by using probabilities the classifiers produce and user-specified thresholds. Using these assigned classifications, auditors could then put e-TPs into a set of code flaws to be fixed, ignore e-FPs, and prioritize I alerts for manual auditing according to level of confidence, cost to repair, and estimated risk if not repaired.
In 2016, we created two types of classifiers:
- all data, with the rule names used as a feature
- per rule, which uses only data with alerts mapped to that particular coding rule
(See the SEI blog post Prioritizing Security Alerts: A DoD Case Study for more details.)
Using the largest data set, the all-data classifiers ranged between 88% and 91% precision. For the single-rule classifiers, only three had sufficient data to have confidence in the classifier predictions.
Now our work focuses on rapidly increasing the number of per-condition classifiers using conditions from two taxonomies: CWE and SEI CERT Coding Standards.
We want our research to result in tools that automate the process of classifying alerts, routing alerts to appropriate work groups, and prioritizing indeterminate alerts. We plan to integrate automated classifiers with SCALe, so it filters and prioritizes alerts using the classification and prioritization scheme from this research. Our goal is more secure code and lower costs, enabled by efficient direction of human efforts for manual alert auditing and code repair.
September 21, 2018 Presentation
This presentation was given by author Lori Flynn Presented to: Raytheon’s Systems And Software Assurance Technology Interest Group.read
August 14, 2018 Conference Paper
This paper was accepted by the SQUADE workshop at ICSE 2018. It describes the development of several classification models for the prioritization of alerts produced by static analysis tools and how those models were tested for accuracy.read
April 30, 2018 Blog Post
Numerous tools exists to help detect flaws in code. Some of these are called flaw-finding static analysis (FFSA) tools because they identify flaws by analyzing code without running it. Typical output of an FFSA tool includes a list of alerts...read
April 23, 2018 Presentation
Lori Flynn describes some of the accomplishments and challenges of the FY16-17-18 classifier research she led.read
January 23, 2017 Blog Post
Federal agencies and other organizations face an overwhelming security landscape. The arsenal available to these organizations for securing software includes static analysis tools, which search code for flaws, including those that could lead to software vulnerabilities. The sheer effort required...read
November 01, 2016 Presentation
In this presentation, Lori Flynn describes work toward an automated and accurate statistical classifier, intended to efficiently use analyst effort and to remove code flaws.read
October 18, 2016 Poster
This poster describes CERT Division research on an automated and accurate statistical classifier.read
June 06, 2016 Blog Post
In 2015, the National Vulnerability Database (NVD) recorded 6,488 new software vulnerabilities, and the NVD documents a total of 74,885 software vulnerabilities discovered between 1988-2016. Static analysis tools examine code for flaws, including those that could lead to software security...read