Software Engineering Institute Carnegie Mellon

Reverse-Engineering Environment Framework, A

  [Top] [Prev] [Next] [Bottom] [PDF]

 

1. Introduction

Many organizations are faced with maintaining aging software systems that are constructed to run on a variety of hardware types, are programmed in obsolete languages, and suffer from the disorganization that results from prolonged maintenance. As software ages, the task of maintaining it becomes more complex and more expensive. Poor design, unstructured programming methods, and crisis-driven maintenance can contribute to poor code quality, which in turn affects understanding.

Better understanding of a program aids in common activities such as performing corrective maintenance, reengineering, and keeping documentation up to date. To minimize the likelihood of errors introduced during the change process, the software engineer must understand the system sufficiently well so that changes made to the source code have predictable consequences. But such understanding is difficult to recover from a legacy system after many years of operation.

Program understanding is a relatively immature field of study in which the terminology and focus are still evolving. The goal of program understanding is to acquire sufficient knowledge about a software system so that it can evolve in a disciplined manner. The essence of program understanding is identifying artifacts and understanding their relationships; this process is essentially pattern matching at various abstraction levels. It involves the identification, manipulation, and exploration of artifacts in a particular representation of a subject system via mental pattern recognition by the software engineer and the aggregation of these artifacts to form more abstract system representations.

1.1 Program Understanding Support Mechanisms

There are a variety of support mechanisms for aiding program understanding. They can be grouped into three categories: unaided browsing, leveraging corporate knowledge and experience, and computer-aided techniques like reverse engineering. Unaided browsing is essentially "humanware": the software engineer manually flips through source code in printed form or browses it online, perhaps using the file system as a navigation aid. This approach is almost always used in some form, but it is not really a viable approach for very large systems. A good software engineer may be able to keep track of approximately 50,000 lines of code in his or her head. If there is much more than that, then the amount of information to keep track of becomes unwieldy.

The second category of support mechanism is leveraging corporate knowledge and experience. This can be done through mentoring or by conducting informal interviews with personnel knowledgeable about the subject system. This approach can be very valuable if there are people available who have been associated with the system as it has evolved over time. They carry important information in their heads about why the system was designed the way it was, the major changes that have occurred over its life cycle, and where subsystems have proven particularly troublesome. For example, they may be able to provide guidance on where to look when carrying out a new maintenance activity if it is similar to another change that took place in the past. This approach is useful both for gaining a big-picture understanding of the system and for learning about selected subsystems in detail. Unfortunately, this type of corporate knowledge and experience is not always available. The original designers may have left the company. The software system may have been acquired from another company. Or the system may have had its maintenance out-sourced.

In this situation, the only recourse is the third category of support mechanisms: computer-aided reverse engineering. A reverse-engineering environment can manage the complexities of program understanding by helping the software engineer extract high-level information from low-level artifacts, such as source code. This frees software engineers from tedious, manual, and error-prone tasks such as code reading, searching, and pattern matching by inspection.

1.2 About the Reverse-Engineering Environment Framework

Although substantial process has been made in tool-based environmental support for aiding program understanding, there is not a satisfactory mechanism to classify the technology that is currently available. As a result, it is difficult to compare the purposes, functionality, and characteristics of different program-understanding tools and techniques. To address this need, this report provides a descriptive model that categorizes important support mechanism features based on a hierarchy of attributes.

The model can be used for characterizing an individual support mechanism, a set of which can then be compared using a common vocabulary. At present, the model organizes attributes according to the broad categories of cognitive-model support, reverse-engineering tasks, canonical activities, quality attributes, and miscellaneous characteristics.

The reverse-engineering environment framework described in this report is based on an earlier effort that was called "Towards a Framework for Program Understanding" [Tilley 96a]. This earlier work was an initial attempt to solicit feedback from the community on the structure and contents of the framework. Two special meetings were held to discuss the framework. The first was during the 1996 Workshop on Program Comprehension and the second was during the 1996 Software Technology Conference. One of the most important comments received was that the framework was too top-down. It was unclear whether the framework was meant to characterize research efforts or to characterize reverse-engineering environments. Since the primary use of the framework is to guide advanced practitioners on reverse engineering options, the framework was reorganized to reflect this goal. A new effort, the Program Understanding Framework, was started to characterize current activity areas in program understanding. This latter framework is also under revision and has been used to describe the current state-of-the-practice in program understanding [Tilley 96b, Tilley 98].

Since 1996 the reverse-engineering environment framework has received input via email based on the material on the Reengineering Center’s Web site (http://www.sei.cmu.edu/reengineering).In addition, the framework has benefited enormously from discussions with selected representative of academia and industry. It is expected that further meetings with various members of the program understanding, reverse engineering, and reengineering communities will continue to contribute to the framework as it evolves.

1.3 Organization of This Report

The next section discusses support for different cognitive models that can greatly affect the usefulness of a reverse-engineering environment. Section 3 describes the typical reverse-engineering tasks; whether or not an environment supports these tasks can be a motivating factor for selecting one environment over another. Section 4 describes the canonical activities that are characteristic of any reverse engineering task, no matter what environment is used. A reverse-engineering environment also exhibits certain quality attributes (the "ilities") that affect its usefulness. For example, the degree of extensibility of the system can affect how well the tool can be tailored to specific reverse-engineering tasks. Section 5 explores some of these quality attributes in more detail. Section 6 discusses the miscellaneous characteristics that can be important factors in the selection of a reverse-engineering environment, such as cost. Section 7 summarizes the report.


[Top] [Prev] [Next] [Bottom] [PDF]