Design Pattern Recovery from Malware Binaries
Created September 2017
The U.S. DoD and industry face a wide variety of problems with malware. CERT researchers automate operational malware analysis capabilities, including those focused on malware family evolution and similarity.
Malware Is a Growth Industry
The U.S. DoD and other government agencies face many and varying problems with malware. Attackers conduct cyber attacks on the networks of the critical U.S. infrastructure in attempts to disrupt operations, destroy data, and steal intellectual property. The U.S. DoD Cyber Strategy, published in April 2015, reports that “the uncontrolled spread of destructive malware to hostile actors presents a significant risk” to U.S. networks and data, and the U.S. DoD must counter this spread of malware.
Malware also affects industry at a large scale. Data breaches as a result of malware attacks have occurred in retail, health care, education, financial services, and manufacturing. And the cost of dealing with malware for both business and consumers is rising.
Malware analysis is a time-consuming and complex, manual process that requires specialized reverse-engineering skills. Efficient determination of whether a new malware sample resembles a known one can help in responding to cyber attacks. By automating malware analysis tasks, we speed the rate of learning about how malware behaves. Automated binary analysis is a promising technique for grappling with the quantity and variety of malware in circulation.
Previous work on malware similarity has focused on low-level syntactic features (such as individual assembly language instructions) or semantic concepts (such as code-level functional equivalence). Our work focuses on higher level abstractions—the design patterns that malware authors use.
Malware authors face the same software design challenges that all software authors face. They develop reusable software components to make evolving their software easier, and they modularize their code to extend its functionality and accomplish new goals. Our analysts have identified malware families with similar designs but obviously different implementations. We want to match these patterns in executables by recognizing higher design abstractions in low-level assembly.
Automating Malware Analysis Helps Analysts Respond Faster to Malware Attacks
Our goal is to provide human analysts with the information they require to make design-level similarity decisions using an automated tool. This tool, called the Malware Design Matcher, dramatically reduces the number of hours of manual reverse engineering required to gather data for comparing malware design patterns.
We developed this tool using Pharos, our binary program analysis framework, which is built on the Lawrence Livermore National Laboratory's ROSE compiler infrastructure. Using ideas inspired by research on design pattern recovery from source code, we expanded Malware Design Matcher to find similar abstractions. We developed a type recovery system to automatically recover prototype declarations of malware functions and a design pattern-matching system to look for patterns in malware.
The type recovery framework—We implemented the type recovery framework in Pharos. We collected information about the types of each parameter used in various operating system calls by harvesting this information from source code headers and Internet documentation. Using data flow analysis, the type recovery framework propagates types to malicious functions defined by the user. Automatically recovered function prototypes contain detailed type information, a rich source of information about the design and architecture of the program. This information is valuable for manual reverse-engineering tasks.
The design pattern matching system—Existing research focuses on the recovery of design patterns from source code. These approaches are based on detecting a set of features that distinctively identify a particular design pattern. We applied similar techniques to binary executables. We selected features described in the literature that are available from binaries and are appropriate for detecting malware design patterns. Our tool tests for each feature in a design pattern and determines if that pattern is present and if all of its features are present. We plan to use the pattern-matching system to detect binary implementations of standard template library functions, a common problem in malware analysis that is typically not a concern when analyzing source code.
CERT analysts have observed some well-known design patterns, such as singleton and factory, in malware binaries, but the primary value of a binary design-pattern matching capability is for detecting malware-specific designs, such as a backdoor command dispatcher or a process injection component. We have successfully identified malware-specific designs, expressed them as design features, and detected these patterns automatically in malware.
Testing Pharos on a malware variant—To evaluate our tool, we conducted an experiment in which we built a gh0st/evilight malware variant from source code and manually generated a design pattern signature for the malware family using only high-level abstractions, such as the relationships between classes. Using the Pharos infrastructure, the tool automatically discovered a wide variety of features from the malware binary, including classes, methods, virtual function tables, virtual function calls, and calls to operating system APIs. After detecting these features, the Malware Design Matcher tool automatically used the signature to assign labels to each of the identified features.
The Malware Design Matcher tool also allows malware analysts to share knowledge about malware families, enabling more efficient responses to malware with similar functions. Because the tool automatically assigns semantic labels to detected features, malware analysts can significantly reduce the effort required to analyze new variants of the same malware family, since the overall design of the malware is likely to be unchanged in later variants.
Matching design patterns focuses analysts’ attention on unmatched features in new variants of malware that warrant further analysis. Additionally, the tool enables malware analysts to document the high-level abstractions present in a specific malware family in a way that allows other analysts to benefit from those insights.
This project resulted in new capabilities for our automated analysis framework and the construction of a new prototype tool called Malware Design Matcher. We will use this tool to support ongoing work on the similarity and evolution of malware families. We will continue to evolve the design pattern matching capability and conduct more experiments on pattern variation in malware. Future work will also apply the capability to recognize complex libraries, such as the standard C++ library in binary programs.