NEWS AT SEI
This article was originally published in News at SEI on: September 1, 1999
A documented, analyzed software architecture is a key ingredient in achieving quality in large software- intensive systems. But the system that is implemented must conform to its architecture for the qualities of the design to carry over into the implementation. To ensure that systems conform to their architectures, and remain in conformance throughout their lifetimes, we need to reconstruct architecture from source artifacts. To do this properly, a wide variety of tools that provide both static and dynamic information are needed. Thus, we advocate a workbench approach to architecture reconstruction tools.
Why Reconstruct Software Architectures?
Evaluation of an architecture’s properties is critical to successful system development . However, reasoning about a system’s intended architecture must be recognized as distinct from reasoning about its realized architecture. As design and eventually implementation of an architecture proceed, faithfulness to the principles of the intended architecture is not always easy to achieve. This is particularly true in cases where the intended architecture is not completely specified, documented or disseminated to all of the project members. In our experience this is the rule, and well-specified, documented, disseminated, and controlled architectures are the exception.
This problem is exacerbated during maintenance and evolutionary development, as architectural drift and erosion occur. However, if we wish to transfer our reasoning about the properties of a system’s intended architecture to the properties of the implemented system, we must understand to what degree the realized architecture conforms to the intended architecture.
Architectural conformance may only be measured if we have available two architectures to compare: the intended architecture and the architecture as it is realized in the implemented system. The former should be documented early in a system’s lifetime and maintained throughout. The latter, however, typically exists only in artifacts such as source code and makefiles and, occasionally, designs that are directly realized as code (through, for example, the use of an architecture description language). In addition, it is infrequent that an implementation language provides explicit mechanisms for the representation of architectural constructs. Therefore, facilitiesfor the reconstruction of a software architecture from these artifacts are critical in measuring architectural conformance.
Beyond its importance for measuring architectural conformance, software architecture reconstruction also provides important leverage for the effective reuse of software assets. The ability to identify the architecture of an existing system that has successfully met its quality goals fosters reuse of the architecture in systems with similar goals; hence architectural reuse is the cornerstone practice of product line development.
In the remainder of this paper we will discuss several issues relevant to the successful reconstruction of software architectures. These issues include the following:
- the need to expand our horizons beyond the use of purely static information during reconstruction
- the need to support the reconstruction of software architectures based on a paucity of information, such as non-compilable code, obsolete artifacts, or the absence of architectural information
- the need to leverage existing commercial and research tools within a comprehensive reconstruction framework
Static Information Is Insufficient
A significant quantity of information may be extracted from the static artifacts of software systems, such as source code, makefiles, and design models, using techniques that include parsing and lexical analysis. Unfortunately, system models extracted using these techniques provide a minimum of information to describe the run-time nature of the system. The primary factor contributing to this deficiency is the widespread use of programming language features, operating system primitives, and middleware functionality that allow the specification of many aspects of the system’s topology to be deferred until run-time. All but the simplest systems use some subset of
- language features such as polymorphism and first-class functions (including approximations such as those provided by C and C++)
- operating system features such as proprietary socket-based communication and message passing
- middleware layers such as CORBA
These mechanisms permit systems to be designed with low coupling and a high degree of flexibility. While these are laudable goals, they obscure the architecture reconstruction process.
In particular static extraction techniques can provide only limited insight into the run-time nature of systems constructed using such techniques, because many of the details that determine actual communication and control relationships simply do not exist until run-time, and hence cannot be recognized until run-time. For example, relationships among communicating processes might be determined via an initialization file, or even dynamically, based upon the availability of processing resources. However, most existing architecture reconstruction tools depend almost exclusively on abstract syntax trees (ASTs) extracted using parsing-based (i.e., compile time) approaches, and so they cannot gather such information.
To achieve a better understanding of the run-time nature of systems that leverage mechanisms such as those described above, we must consider how we may go about extracting dynamic information from a running system. Some techniques to accomplish this include profiling and user-defined instrumentation. Profiling is a technique that is traditionally used for system performance analysis. When profiling, one typically compiles a system with a special flag that instructs the compiler to instrument the code such that it records information pertaining to function invocation during execution. The system is then exercised and the recorded information is analyzed. We may use this technique to determine actual function invocations, augmenting our statically extracted models with improved information concerning polymorphic functions and functions executed through function pointers (e.g., in C or C++).
In a similar fashion, user-defined instrumentation is a technique for adding special-purpose tracing functionality to a system to allow monitoring of its operation. For example, instrumentation can be added to application code responsible for interprocess communication to determine the system’s run-time communication topology. In addition, it is sometimes possible to instrument libraries or even the operating system. This allows the possibility of instrumentation of systems without modifying any application code and requires less application-specific knowledge.
Throughout our experience with software architecture reconstruction, we have frequently been faced with systems that cannot be compiled. Often we are provided with a complete body of application code, but we are missing the header files and/or some of the libraries that are needed to compile it. In other cases, the application code is written in a peculiar dialect of a standard language (e.g., Objective C or an obscure dialect of Fortran), or implemented on an uncommon hardware platform, and thus may only be compiled with specialized tools that are not available to us. Off-the-shelf parsers and analyzers simply do not apply in these cases. There are cases when this lack of information makes a tool-facilitated reconstruction of the architecture nearly impossible. For example, when dynamic binding of function calls or dynamic relationships between processes are used extensively in a system, we may need to compile and run the system to determine the true relationships between the components.
Fortunately, we can often reconstruct software architectures even under these difficult circumstances. To accommodate non-compilable code whenever possible, we can avoid complete reliance on parsing-based techniques and can resort to lexical analysis for information extraction. While this is not an ideal situation (you often don’t have as much information to work with in these circumstances, and lexical techniques are frequently unreliable ), it is reality.
Reconstructing Architecture in a Vacuum
Unfortunately, it is frequently the case that our efforts to reconstruct the software architectures of systems must contend with a complete lack of pre-existing architectural information. This often occurs when the system being analyzed is particularly old (and thus there are no longer any designers or developers who can relate architectural information) or when the system’s intended architecture was never documented. Furthermore, these are typically the situations in which we are most interested in recovering a system’s architecture. In particular, it is common for such systems to be involved in ongoing maintenance or even undergoing a more global re-engineering effort (e.g., modernization or porting).
Successful architecture reconstruction revolves around the acquisition of architectural constraints and patterns that capture the fundamental elements of the architecture. Regardless of the mode of development of a system, its evolutionary state, or its age, such constraints and patterns are always present. However, they are rarely (even when a truly architecture-based development process is followed) captured explicitly. Thus, the primary task of the reconstruction analyst is acquisition of this information by means other than investigation of documentation.
One mechanism for acquiring architectural patterns is exploration of the information extracted from the system artifacts. Such exploration will often uncover frequently-used idioms of control flow, data flow, interface usage or interaction among the architectural components of the system. An alternative to this type of ad hoc exploration is the application of an architecture analysis method, such as the Architecture Tradeoff Analysis Method (ATAM) , as an elicitation device. Although ATAM was developed as a method for analyzing the tradeoffs inherent in architectural design, it may also serve as a structured way to elicit architectural information. ATAM is scenario-based: Scenarios are used to capture uses of and changes to the system being analyzed. It is the "scenario mapping" step of the method that is useful for architectural elicitation; in this step, the scenarios are traced through the architecture. For uses of the system, the participating components and their communication patterns are identified. For changes to the system, the architecture-level impacts are identified. In this way, we have a structured exercise for exploration of the architecture of the system. If the system being analyzed is indeed undergoing some level of maintenance, there will always be one or more developers who have some knowledge of the system’s operation and can participate in such an exercise.
An Architecture Reconstruction Workbench
Based upon the above observations, in developing tool support for software architecture reconstruction, our realization was that no particular static set of tools (extractors, visualization tools, analysis tools, etc.) would suffice for every task. At the very least, we wanted to support many implementation languages, many target execution platforms, and many techniques for architecture analysis. Thus, we wanted a workbench: an open and lightweight environment that provides an infrastructure for the opportunistic integration of a wide variety of tools and techniques. The workbench must be open to allow easy integration of additional tools. The workbench must be lightweight to ensure that no unnecessary dependencies exist among already integrated tools and data.
We realized the workbench concept in Dali, a support environment for software architecture reconstruction . Dali’s current implementation uses an SQL repository for data storage and manipulation. SQL provides the medium for the primary activities of the Dali user: architectural pattern definition, view creation, view fusion, and pattern recognition.
View fusion is a technique for combining extracted information--views--from different sources (such as code artifacts, directory structures, naming conventions, execution information) to improve the quality of the overall model of the system . Architectural patterns are the medium for a Dali user to express an understanding of a system’s architecture as structural and attribute-based relationships among its components. The patterns are most commonly expressed via SQL, but may, in principle, be expressed via any of the tools that Dali integrates.
Rigi , which provides flexible graph editing functionality and effective end-user programmability, supplies Dali’s user interaction and also acts as a vehicle for tool integration.
Dali, as represented in Figure 1, currently integrates several commercial and research tools for the following:
- extraction of information--rigiparse for C, Imagix  for C and C++, SNiFF+  for C++ and Fortran, LSME  for C++
- visualization--dot for graph layout
- architectural analysis--IAPR  for architectural complexity analysis, RMTool  for automatic conformance measurement
The use of commercial tools has provided a great deal of leverage during the development of Dali: These tools are typically robust, well documented, and well tested.
Dali offers an iterative, interpretive, user-driven approach to architectural reconstruction. Our view is that no system has "an" architecture. It has many views of a complex body of interrelated information and the choice of which views to extract and reconstruct is driven by the user’s information needs. It is, however, interesting to consider the implications of the user-driven nature of Dali with respect to the issues raised above.
Figure 1: The Dali Workbench
Consider the case of non-compilable code: Because of the iterative nature of Dali, we can use the information gleaned from purely lexical extractors to guide the user’s investigations. For example, in this case the user could extract some portion of the architectural information-- knowing that it was incomplete--and then reconstruct just part of the architecture. This would then lead to new questions being asked and new lexical extractors written, and the process repeats with increasing fidelity.
In a similar fashion, when we extract views from dynamic information, these views are incomplete, since dynamic information, like run-time testing, only reflects the parts of the system exercised during the execution. To gain a complete picture of the system, run-time views need to be fused with other (possibly static) views to improve the overall quality of the reconstruction.
Lastly, it is again Dali’s iterative and interactive nature that makes it appropriate for supporting a process of reconstructing architecture in the absence of significant architectural information. Lightweight view extraction and flexible visualization provide an environment that fosters opportunistic exploration of the available information. These features also provide the necessary support for the scenario-mapping task that is central to the application of the ATAM method in this context.
It is the combination of Dali’s openness and its interactivity that helps us overcome the inherent limitations of any single extraction and reconstruction technique. Dali does not offer a monolithic solution, but rather a set of tools, some of which are heuristic. As a result, we have been able to apply Dali to a wide range of systems, written in many different languages and dialects, written for different hardware platforms and operating systems, and created with vastly different levels of architectural expertise.
Architecture Reconstruction in Practice
As part of a large-scale product-line migration effort, the Software Engineering Institute is assisting a major U.S. industrial organization ("BigTruck Inc.") in the identification of key architectural elements and their relationships in an existing real-time embedded control system called "X1." The existing legacy system supports many product instantiations. X1 is being migrated to a product line, "X2." The set of X1 components will be identified as an "X1 baseline," to identify the current operating architecture of the in-place software. It is expected that this baseline model will provide the starting point for trading off a variety of to-be-determined architectural requirements derived from existing X1 customers and BigTruck sources as well as new or changed requirements for the future X2 (and beyond) architectures. The key players in this effort are the SEI team members and the newly formed BigTruck Software Architecture Group (SAG) headed by a very competent architect, "Hugh Effert."
The SEI team and the BigTruck architect have worked together to define a reconstruction process to support the product line migration. This process was evolved in an effort to assist the SAG in gaining control over the many X1 components--control that was initially distributed among a number of orthogonal sub-projects. Further, the process is intended to support the reality that migration to product lines must usually accommodate moving-target baseline systems. In the case of BigTruck, three individual business-critical deliveries of instantiations of the X1 architecture will occur prior to the release of X2.
The process of reconstructing the X1 architecture currently underway at BigTruck is summarized in the major steps outlined below.
Task A: Develop Component Information/Identification Form
In order to reconstruct an architectural representation it is first necessary to identify what the core set of components is in the particular implementation. This identification form should provide examples and guidance to system experts to identify "what makes a component a component" in the particular system or sub-system. The form will be used in Task C when SAG members formalize the properties of components initially identified by the SAG in Task B. The form is meant to provide guidance to SAG members as they attempt to carefully specify what attributes and relationships in the implementation signify membership in (or exclusion from) a particular component. Examples of identifying attributes include naming conventions for code elements such as functions or variables and code location in terms of directories. Examples of relationships may include certain control-passing styles among components or within a particular component, or potentially the types and names of data used within a particular component.
Task B: Identify Components and Layers
As a first step, the SAG will attempt to determine major responsibilities and architectural layout by naming the major architecture components and attempting to assign them to layers according to current beliefs (hopes?) about the layered structure of the existing X1 code base. Certainly it is currently believed that the X1/X2 (new) architecture is intended to reflect the layers represented as closely as is practicable. Following completion of this initial SAG effort, the individual components will be characterized according to properties laid out in Task C.
Task C: Populate Information Forms
Using the forms created in Task A, the SAG member responsible for each component will specify the defining characteristics of the component. This process of expert identification of architectural signatures is critical to the task of extracting the "as-implemented" architecture as this information is the primary source for the specification, in Task D, of rules for clustering software into architectural components. The process of populating the form with information developed in Task A may result in the modification of the form itself as unanticipated features characterizing architectural components are discovered by SAG members.
Task D: Develop Architectural Rules
From the information forms filled in by the SAG group members, the SEI will work with Hugh to represent the information describing the architectural components in Dali SQL queries, which will be used to match (Task E) the components from the extracted system information database built in Task F. The SEI team can commence planning the rule descriptions as soon as the information forms begin arriving from the SAG team members.
Task E: System Extraction to Database
The SEI will, once the BigTruck source code and associated makefiles have been received, parse the system into an intermediate representation stored in a Dali relational database. While there is a standard set of parsing tools the SEI typically uses in such extractions, the complexity of the identifying features used in component description in Task C and D determines the degree of detail required in the parse-and-represent process.
Task F: Apply Rules to Database
Following the population of the system database and the development of the architectural rules (Tasks D and E), the SEI team will utilize Dali to match the rules against the extracted system. The resulting clusters can be visualized and further manipulated using Dali. This work will be done in conjunction with the SAG in order to facilitate the transfer of the skills required to BigTruck in a timely manner. The actual rule application (and possibly refinement) may determine new constraints that require additional (or repeated) extraction work, possibly with additional tools, and may also impact the rules defined and potentially even the nature of the information forms used to determine the component identity features.
Task G: Map Models to Architectural Views
Using Dali’s query, visualization, and manipulation tools, the extracted models of the X1 system will be mapped to a standard UML template (previously developed by the SAG). The architectural views will be constructed showing the "as-implemented" architecture given the component definitions and rules developed earlier. The mapping process will be undertaken jointly between the SEI and BigTruck’s Hugh Effort.
Task H: Evaluate Architectural Views
The views built during task G will be evaluated for use in the ongoing development of the X2 architecture. This evaluation will verify that the views are able to support the types of design and analysis that the X2 architecture development process will require. The sub-processes contained from Tasks D, F, E, and G are presumed to be iterative as required.
The BigTruck Big Picture
The BigTruck effort is a tremendous example of the role software architecture reconstruction can (and should) play in the context of a product-line migration. The migration itself depends upon reliable views of the existing software architecture in addition to a clear understanding of the new architecture. The SEI has assisted BigTruck in generating its new X2 architecture by mapping BigTruck’s new requirements against its existing X1 architecture representation. Evaluation of the new architecture is being conducted through architectural reviews and the use of the SEI’s Architecture Tradeoff Analysis Method . Migration plans are being made in an incremental manner, allowing BigTruck to continue delivering X1 instantiations during the X2 evolution. Ultimately, the mapping at the code level to the X1 architectural representation will support a reliable source migration to the X2 architecture. Since BigTruck plans to generate performance models from its new X2 architecture and make performance tradeoff decisions based on these models, the accuracy of the X2-to-source mappings must be trustworthy.
About the Authors
Steven Woods is a member of the technical staff at Carnegie Mellon University's Software Engineering Institute, where he is a member of both the Reengineering Center and the Architectural Tradeoff Analysis (ATA) Initiative. The Reengineering Center has a mandate to identify, enhance, and transition best practice in software reengineering in a disciplined manner, while the goals of the ATA Initiative are to refine the set of techniques for analyzing architectures with respect to various quality attributes, to develop a method for understanding how these attributes interact, and to offer the software development community mature architecture evaluation practices.
Steven's particular expertise lies with analyzing, applying, integrating and extending existing and emerging toolsets for software reengineering, understanding and architectural analysis. In particular, the SEI wishes to leverage these tools as an crucial part of the process of developing and evolving software systems into product-line assets.
S. Jeromy Carrière is a member of the technical staff at the SEI. Before joining the SEI, Carriere was a software engineer with Nortel (Northern Telecom). His primary interests are related to software architecture: recovery, re-engineering, representation, analysis, and tool support. He is the author of several papers in software engineering, information visualization, and computer graphics. Carriere received a B. Math from the University of Waterloo and is a member of the Association for Computing Machinery.
Rick Kazman is a senior member of the technical staff at the SEI, where he is a technical lead in the Architecture Tradeoff Analysis Initiative. He is also an adjunct professor at the Universities of Waterloo and Toronto. His primary research interests within software engineering are software architecture, design tools, and software visualization. He is the author of more than 50 papers and co-author of several books, including a book recently published by Addison-Wesley entitled Software Architecture in Practice. Kazman received a BA and MMath from the University of Waterloo, an MA from York University, and a PhD from Carnegie Mellon University.
 L. Bass, P. Clements, R. Kazman, Software Architecture in Practice, Addison-Wesley, 1997.
 G. Murphy, D. Notkin, "Lightweight Lexical Source Model Extraction," ACM Transactions on Software Engineering and Methodology, 5(3) (July 1996), 262-292.
 R. Kazman, M. Klein, M. Barbacci, T. Longstaff, H. Lipson, S. J. Carrière, "The Architecture Tradeoff Analysis Method," Proceedings of ICECCS, (Monterey, CA, July 1998).
 R. Kazman, S. J. Carrière, "Playing Detective: Reconstructing Software Architecture from Available Evidence," Journal of Automated Software Engineering, 6(2) (April 1999), 107-138.
 R. Kazman, S. J. Carrière, "View Extraction and View Fusion in Architectural Understanding," Proceedings of the 5th International Conference on Software Reuse (Victoria, BC, Canada, June 1998), 290-299.
 K. Wong, S. Tilley, H. Müller, M. Storey. "Programmable Reverse Engineering," International Journal of Software Engineering and Knowledge Engineering, 4(4) (December 1994), 501-520.
 Imagix Corporation
 TakeFive Software
 R. Kazman, M. Burth, "Assessing Architectural Complexity," Proceedings of 2nd Euromicro Working Conference on Software Maintenance and Reengineering, (Florence, Italy, March 1998), 104-112.
 G. Murphy, D. Notkin, K. Sullivan, "Software Reflexion Models: Bridging the Gap between Source and High-Level Models," Proceedings of the Third ACM SIGSOFT Symposium on the Foundations of Software Engineering (Washington, D.C., October 1995).
The views expressed in this article are the author's only and do not represent directly or imply any official position or view of the Software Engineering Institute or Carnegie Mellon University. This article is intended to stimulate further discussion about this topic.