Using Automation to Prioritize Alerts from Static Analysis Tools

Created September 2017

Validating and repairing defects discovered by static analysis tools can require more human effort from auditors and coders than organizations have. CERT researchers are developing a method to automatically classify and prioritize alerts to help auditors and coders address large volumes of alerts with less effort.

The Static Analysis Challenge: Sorting the Problems You Have from the Problems You Don’t Have

Federal agencies and other organizations face an overwhelming number of security challenges in their software. Static analysis (SA) tools attempt to automatically identify defects in software products, including those that could lead to security vulnerabilities. These tools define a set of conditions for a well-behaved program.

SA tools analyze the program to find violations of those conditions by examining possible data flows and control flows, without executing the program. Then they produce diagnostic messages, or alerts, about purported flaws in the source code. According to organizational priorities, human auditors then evaluate the validity of these alerts and repair confirmed flaws.

SA tool producers improve the methods their tools use to check for code flaws, devising new algorithms that analyze faster, use less memory or disk space, work more precisely, or find more true positives. Tool producers also work to increase their coverage of code flaw taxonomies such as CWE, SEI CERT Coding Standards, and MISRA C.

As software assurance tools identify more kinds of code flaws, more true flaws and false positives are reported as alerts. SA tools also exhibit false negatives, meaning they sometimes do not produce a warning when a true code flaw exists. Development organizations attempt to address this problem by running multiple SA tools on each codebase to increase the types of code flaws they can find. However, this approach compounds the problem of having too many alerts—both true and false positives.

For most large codebases, using just one or two general SA tools generates too many alerts for a team to address within a project’s budget and schedule. Auditors and coders urgently need an automated method to classify true- and false-positive alerts. They also need automated support to organize and prioritize alerts from SA tools to manually evaluate and address them effectively and efficiently.

Our Collaborators

Three DoD organizations have agreed to provide sanitized alert audit data to support our research. We also work with collaborators at MITRE on mappings between SEI CERT Coding Standards and Common Weakness Enumerations (CWEs). Dr. Claire Le Goues, from the Carnegie Mellon University Computer Science Department, serves as an advisor with expertise on assuring high-quality software systems.

Our Automated Solution: Classifying Alerts and Prioritizing Them for Action

We began with our audit archives from previous static analyses of 20 codebases by the CERT Source Code Analysis Laboratory (SCALe) code conformance service. The SCALe tool uses multiple commercial, open-source, and experimental SA tools to analyze codebases for potential flaws. By using multiple tools, SCALe finds more detected code defects than any single SA tool would find. To expand our data set, three DoD collaborators provided sanitized audit data from their own codebases, analyzed by an enhanced research prototype version of SCALe.

Our solution fuses alerts from different tools for the same code flaw at the same location. Fusion requires mapping alerts from different tools to a code flaw taxonomy; in 2016, we used SEI CERT Coding Standards. The script we created performs fusion and performs additional analysis by counting alerts per file, alerts per function, and the depth of affected files within the code.

CERT researchers use the results of this analysis as “features” for the classifiers. Features are types of data that are analyzed by the mathematical algorithms for the classifiers. They include data gathered by code metrics tools and general SA tools about the program, file, function, and other categories relevant to each alert. These features help us develop more accurate classifiers.

We classify alerts into one of three categories:

expected true positive (e-TP)
expected false positive (e-FP)
indeterminate (I)

We assign membership to one of these classes by using probabilities the classifiers produce and user-specified thresholds. Using these assigned classifications, auditors could then put e-TPs into a set of code flaws to be fixed, ignore e-FPs, and prioritize I alerts for manual auditing according to level of confidence, cost to repair, and estimated risk if not repaired.

In 2016, we created two types of classifiers:

all data, with the rule names used as a feature
per rule, which uses only data with alerts mapped to that particular coding rule

(See the SEI blog post Prioritizing Security Alerts: A DoD Case Study for more details.)

Using the largest data set, the all-data classifiers ranged between 88% and 91% precision. For the single-rule classifiers, only three had sufficient data to have confidence in the classifier predictions.

Now our work focuses on rapidly increasing the number of per-condition classifiers using conditions from two taxonomies: CWE and SEI CERT Coding Standards.

Software and Tools

SCALe Collection

August 2018

The CERT Division's Source Code Analysis Laboratory (SCALe) offers conformance testing of C and Java language software systems against the CERT C Secure Coding Standard and the CERT Oracle Secure Coding Standard for...

read

Source Code Analysis Laboratory (SCALe)

March 2012

In this report, the authors describe the CERT Program's Source Code Analysis Laboratory (SCALe), a conformance test against secure coding...

read

Looking Ahead

We want our research to result in tools that automate the process of classifying alerts, routing alerts to appropriate work groups, and prioritizing indeterminate alerts. We plan to integrate automated classifiers with SCALe, so it filters and prioritizes alerts using the classification and prioritization scheme from this research. Our goal is more secure code and lower costs, enabled by efficient direction of human efforts for manual alert auditing and code repair.