NEWS AT SEI
This article was originally published in News at SEI on: December 1, 2003
Substantial computer programs are often remarkably complex in their structure and behavior. When faced with this complexity, software engineers have no practical means to quickly determine the full functional behavior of programs in all circumstances of use. The result is that today we must depend on large-scale systems to function correctly, even though they are composed of programs whose full behavior and security exposures may not be reliably known. The existence of unknown functionality—and the malicious exploitation of that functionality—is the Achilles heel of software. The ability to automatically compute the behavior of software quickly and comprehensively would help to solve the problem.
While computing software behavior is a very difficult task that poses many theoretical and engineering challenges, the benefits could be substantial. For that reason, the SEI’s CERT Research team has begun early, exploratory investigations to discover the extent to which software behavior can be calculated. Of particular interest is the possibility of using behavior calculation for malicious code detection and analysis
Understanding Program Behavior
Understanding program behavior today is a haphazard, error-prone, resource-intensive process. Nevertheless, it is essential for uncovering security and reliability problems. And because attackers can make malicious modifications to programs at any time, the task of behavior discovery never ends. Many programs are difficult to understand because they contain an enormous number of execution paths, any of which might contain errors or security exposures. Faced with massive sets of possible executions, programmers can often do no more than achieve a general understanding of mainline program behavior.
Automated support for behavior discovery would be an ideal solution to this difficult problem. Members of CERT Research are exploring this strategy in an effort called the function extraction (FX) project. The objective of the project is to investigate technology to help move from an uncertain understanding of program behavior derived slowly by humans to a precise understanding quickly calculated by computers.
Treating Programs Like Equations
Traditional engineering disciplines depend on rigorous methods to evaluate the expressions that represent and manipulate their subject matter. For example, the equations that represent fluid mechanics, or electromagnetic fields, allow engineers to understand and predict the behavior of their designs with precision. As an engineering discipline, software engineering has been an exception to this rule; it traditionally has had no way to quickly and completely evaluate the expressions it produces. In this case, the “expressions” are computer programs, and “evaluation” means understanding their full behavior in every instance—behavior that is right or wrong, behavior that is the intention of the engineer or the result of malicious intervention.
The function-theoretic model of software treats programs as mplementations of mathematical functions or relations, that is, mappings from inputs (stimuli) to outputs (responses), no matter what subject matter they deal with. The key to the function-theoretic approach is the recognition that although the sequential logic of programs can contain an immense number of execution paths, this logic is also composed of a finite number of control structures, each of which implements a mathematical function or relation in the transformation of its inputs into outputs.
This finite property of program logic, viewed through the lens of function theory, suggests the possibility of automated calculation of program behavior. Every control structure in a program has a non-procedural behavior signature that defines its net functional effect. Behavior signatures can be extracted and composed with others in a stepwise process. The resulting behavior signature of an entire program represents the specifications or business rules that it implements. Initial investigation of these concepts suggests that theoretical challenges to function extraction may have acceptable engineering solutions. For example, while no general theory for loop behavior calculation can exist, pattern recognition can help provide an engineering approach.
The CERT Research team plans to investigate function extraction methods and to create a demonstration prototype. Foundations for function extraction are being developed, and a pre-prototype function extractor has been created that calculates the behavior of programs written in a small subset of Java. Participation by organizations in government, defense, and industry is welcomed, by partnering with CERT Research as a visiting scientist, or through our Affiliate Program.
Exploration at the SEI
The function extraction (FX) project is
an excellent example of the exploratory research that is done at the
SEI. In the life-cycle model used by the SEI to describe new
technologies to our collaborators, the FX project falls into the
earliest phase: exploration. In the exploration phase, the SEI
identifies potentially high-payoff approaches to DoD needs and other
pervasive problems in the software community at large. While the SEI
provides leadership during the exploration phase, we also try to
identify partners who can work with us to develop a solution. Some practitioners are well suited to joining the SEI in ventures like
the FX project, because they enjoy shaping the problem space and acting
as co-developers. If you identify yourself as an innovator for this
technology, consider joining forces with us on this (or other)