A Framework for Software Product Line Practice, Version 5.0
Mining Existing Assets
Mining existing assets refers to resurrecting and rehabilitating a piece of an old system to serve in a new system for which it was not originally intended. Often it simply refers to finding useful legacy code in an organization's existing systems portfolio and reusing it within a new application. However, the code-only view completely misses the big picture. We have known for years that in the grand scheme of things, code plays a small role in the cost of a system, because it's not the hard part of system/software development. Rich candidates for mining include a wide range of assets besides codeassets that will pay lucrative dividends. Business models, rule bases, requirements specifications, schedules, budgets, test plans, test cases, coding standards, algorithms, process definitions, performance models, and the like are all wonderful assets for reuse. The only reason so-called "code reuse" pays at all is because of the designs, algorithms, and interfaces that come along with the code [Clements 2002c, p. 99].
For example, whole or partial architectures and the design decisions they embody (captured by the documented rationale) are especially valuable. If a mined architecture is suitable, the components that originally populated it can most likely be migrated along with it. But to determine the fitness for reuse of either the architecture or its components, you must have a thorough architectural understanding of the legacy system. And, of course, the architect may be long gone. If good documentation does not exist, you might need to reconstruct the architecture to reveal the interactions and relations among the architecture's components. Reconstruction will illuminate constraints for how the components, if mined, can interact within the architecture of the new or updated software. It can also help you understand the tradeoff options available for reusing components in a new or improved way [Kazman 2002a, O'Brien 2002a]. Once the architecture has been extracted, it can be evaluated for suitability using the techniques described in the "Architecture Evaluation" practice area.
Documentation is an asset that is often overlooked and may have significant reuse potential. Much of the corporate knowledge about the software assets may be captured in existing legacy documentation, which makes them highly desirable candidates for mining and rehabilitation. That is especially true when the associated software assets are being mined and rehabilitated and they closely correlate with one another.
Mining involves understanding what is available, what is needed, and how rehabilitation works. That understanding requires support from analysts who are familiar with both the legacy system and the new system. For software assets, rehabilitation usually requires the support of the new system's architect, who will direct how the assets will be integrated into the new architecture.
For software assets, focus first on large-grained assets that can be wrapped or that will require only interface changes rather than changes in large chunks of their underlying algorithms or code. Determine how the candidate asset can fit into the architecture of the targeted new system. Don't forget to consider the requirements for performance, modifiability, reliability, and other nonbehavioral qualities. In addition, be sure to include all the non-software assets associated with the software: requirements, design, test, and management artifacts.
Once the existing assets have been organized and understood and candidate assets for mining have been identified, the rehabilitation of these assets can begin. In many ways, a mining initiative that involves the extensive rehabilitation of assets can resemble a reengineering project [Seacord 2003a, Sneed 2001a, Ulrich 2002a] or a development project in its own right. Technical planning (as described in the "Technical Planning" practice area) can help in planning and coordinating the effort.
Normally, mining refers to legacy assets previously developed by the organization doing the mining. However, the rehabilitation activities associated with mining also apply to externally available software such as open source software that has entered the organization from the outside.
Aspects Peculiar to Product Lines
Mined assets for a product line must have the same qualities as newly developed core assets. Mined assets must be (re)packaged with reuse in mind and meet the product line requirements. Accordingly, the mined assets must align with the product line architecture and meet the quality goals consistent with those of the product line. Product lines must focus on the strategic, large-grained reuse of the mined assets. The primary issues that motivate large-scale reuse for a product line are schedule, cost, and quality. The mined and rehabilitated assets must meet the needs of the plurality of systems in the product line. Since a product line accommodates a longer and wider view of future system change, any mined asset must be robust enough to accommodate such change gracefully.
When mining an asset (software or otherwise) for a software product line, an analyst should consider
- its alignment with the requirements for immediate products in terms of both common features and variation points
- its appropriateness for potential future products
- whether it will be used as a core asset or for a specific product
- the amount of effort required to make its interface conform to the constraints of the product line architecture
- its extensibility with respect to its potential future (based on the architecture's expected evolution)
- its maintenance history
- other assets (for example, script and data files) that may be required from the legacy system
- its projected long-term cost
When mining software assets for single systems, we look for components that perform specific functions well. However, for product line systems, quality attributes such as maintainability and suitability become more important over time. Thus, we might accept mined assets for product lines that are suboptimal in fulfilling specific tasks if they meet the product line's critical quality attribute goals. An asset's total cost of ownership across the products for which it will be used should be lower than the sum of similar assets mined for one-time use.
Application to Core Asset Development
The process of mining existing assets is largely about finding suitable candidates to be core assets of the product line. Software assets that are well structured and well documented and have been used effectively over long periods of time can sometimes be included as product line core assets with little or no change. Software assets that can be wrapped to satisfy new interoperability requirements are also desirable. Assets that don't satisfy these requirements are undesirable and may have higher maintenance costs over the long term. Depending on the legacy inventory and its quality, an assortment of candidate assets is possible, from architectures to small pieces of code.
An existing architecture should be analyzed carefully before being accepted as the pivotal core assetthe product line architecture. See the "Architecture Evaluation" practice area for a discussion of what that analysis should entail.
Candidate software assets must align with the product line architecture, meet specified component behavior requirements, and accommodate any specified variation points. In some cases, a mined component may represent a potentially valuable core asset but won't fit directly into the product line architecture. Usually, the component will need to be changed to accommodate the constraints of the architecture. Sometimes a change in the architecture might be easier, but it will have implications for other components, for the satisfaction of quality goals, and for the support of the products in the product line.
Once in the core asset base for the product line, mined assets are treated in the same way as newly developed assets. Some assets, though, that may have been mined for expediency in fielding a new product, may be less than robust and be earmarked for future replacement via new development.
Application to Product Development
Although it is reasonable to use mined assets for components that are unique to a single product in the product line, doing so will make the mining activity indistinguishable from mining in the non-product-line case. The same issues discussed above (paying attention to quality attributes, architecture, cost, and time to market) will still apply. And it will be worth taking a long, hard look at whether the mined component really is unique to a single product or could be used in other products as well, thus making the cost of its rehabilitation more palatable. In that case, the team responsible for mining would be wise to look for places where variability could be installed in the future, should the asset in question ever turn out to be useful in a group of products.
SEI Options Analysis for Reengineering (OAR): OAR is a method that can be used to evaluate the feasibility and economy of mining existing components for a product line. OAR operates like a funnel in which a large set of potential assets is screened out so that the effort can focus on a smaller set that will most effectively meet the technical and programmatic needs of the product line. OAR prescribes the following steps [Bergey 2001a, Bergey 2002a, Bergey 2003a]:
- Establish the mining context: First, capture your organization's product line approach, legacy base, and expectations for mining components. Establish the programmatic and technical drivers for the effort, catalogue the documentation available from the legacy systems, and identify a broad set of candidate components for mining. This task establishes the needs of the mining effort and begins to illuminate the types of assets that will be most relevant for mining. It also identifies the available documentation and artifacts and enables focused efforts to close any gaps in the existing documentation.
- Inventory components: Next, identify the legacy system components that can potentially be mined for use in a core asset base for the product line. During this activity, identify the required characteristics of the components (such as functionality, language, infrastructure support, and interfaces) in the context of the product line architecture. This activity creates an inventory of candidate legacy components together with a list of the relevant characteristics of those components. It also creates a list of needs that cannot be satisfied through the mining effort.
- Analyze candidate components: Next, analyze the candidate set of legacy components in more detail to evaluate their potential use as product line components. Screen them on the basis of how well they match the required characteristics. This activity provides a list of candidate components, together with estimates of the cost of rehabilitating those components and the effort required.
- Analyze mining options: Then, assemble different aggregations of components and analyze the feasibility and viability of mining them based on their cost, effort, and risk.
- Select a mining option: Finally, select the mining option that can best satisfy the organization's mining goals by balancing the programmatic and technical considerations of the organization. To do that, establish drivers for making a final decision, such as cost, schedule, risks, and difficulty. (This activity might also help you determine the tradeoffs.) Evaluate each mining option (component aggregation) on the basis of how well it satisfies the most critical driver. Then after you select an option, write a final report to communicate the results of the OAR process.
OAR has been used to make decisions on mining components for a satellite-tracking system [Bergey 2001a]. The process has also been used to evaluate (1) the extent to which components proposed by suppliers for reuse in a product line would meet the product line's stated needs and (2) the types of changes that would be required to fit the component into the product line [Bergey 2003a, Muller 2003a]. OAR is in the process of being extended to handle other asset types such as unit test cases and documentation.
Architecture recovery/reconstruction tools: Some tools that are available to assist in the architecture reconstruction process include Rigi [Muller 1988a], the Software Bookshelf [Finnegan 1997a], DISCOVER [Tilley 1998a], the Dali workbench [Kazman 1998a], and the SEI ARMIN tool [O'Brien 2003a].
The ARMIN tool is a flexible, lightweight tool for architecture reconstruction. It uses information extracted by other tools to generate architectural views. Using ARMIN involves five steps:
- information extraction: which uses tools such as parsers to extract information from existing design and implementation artifacts such as the source code
- database construction: which stores the extracted information in a database for future analysis. This step may involve changing the format of the data.
- view fusion: which augments the extracted information by combining information to generate a set of low-level views of the software
- architecture view composition: which generates a set of architecture views through abstraction, visualizes them, and then enables the user to explore and manipulate them
- architecture analysis: which evaluates the resultant architecture and, in some cases, evaluates the conformance of the as-built architecture obtained from reconstruction to an as-designed architecture
Tool support makes mining undocumented software assets more effective and significantly less cumbersome by reducing the time it takes to ascertain what a piece of software does and how it interacts with other parts of the system. Tools can be brought to bear that automatically chart interconnections of various kinds among software elements. More valuable than tools, however, are the people who worked on and are knowledgeable about the legacy software. Find them if you can. They can tell you the strengths and weaknesses of the software that weren't written down, and they can give you the "inside story" that no tool can hope to recover.
Mining architectures: In some cases, the software architecture of an existing system can become the product line architecture. The SEI Mining Architectures for Product Lines (MAP) method determine whether the architectures of existing systems are similar and whether the corresponding systems have the potential of becoming a software product line [O'Brien 2001a]. The MAP method combines techniques for architecture reconstruction and product line analysis to analyze the architectural patterns and attributes of a set of systems. This analysis determines whether the systems have similar components and connections between their components and examines their commonalities and variabilities. MAP has been used in the development of a prototype product line architecture for a sunroof system. MAP and OAR can also be used together effectively: MAP supports decision making for reusing architectures, while OAR supports decision making for identifying components that fit within the constraints of the architecture.
Requirements reuse and feature interaction management: Developers realize that complex applications are often built best by using a number of different components, each performing a specialized set of services. However, those components, each embodying different requirements in different service domains, can interact in unpredictable ways. For this reason, designing components to minimize or at least manage interaction is an issue. This problem of interaction becomes even more significant when reusing requirements. Interactions must be detected and resolved in the absence of a specific implementation framework. Shehata, Eberlein, and Hoover stress that understanding interaction management is key to understanding how to reuse requirements. They describe a conceptual process framework for formulating and reusing requirements [Shehata 2002a]. Reusable requirements are classified into three levels of abstraction for software requirements: domain-specific requirements, generic requirements, and domain-requirements frameworks. This classification is used as the basis for a reusability plan that underscores the importance of interaction management.
Wrapping: Wrapping involves changing the interface of a component to comply with a new architecture without changing the component's other internals. In fact, pure wrapping involves no change whatsoever in the component; rather, a new thin layer of software is interposed between the original component and its clients. That layer provides the new interface by translating to and from the old. There are enormous advantages to reusing existing assets with little or no internal modification through wrapping. As soon as any modification takes place, the associated documentation changes, the test cases change, and a ripple effect takes place that influences other associated software. Wrapping prevents that ripple effect and allows the "as-is" reuse of many of the assets associated with the software component, such as its test cases and internal design documentation. The idea is to translate the "as-is" interface to the "to-be" interface. Weiderman and colleagues discuss some of the available wrapping techniques [Weiderman 1997a]. Seacord and colleagues discuss a case study that applied several wrapping techniques [Seacord 2001a].
Adapting components: Software components that are being used in a context other than the one for which they were originally developed often do not exactly fit their assigned roles. One technique that can accommodate these differences is use of the Adapter Design pattern [Gamma 1995a]. Using that pattern imposes an intermediary between two components. The adapter can compensate for mismatches in the number or types of parameters within a service signature, provide synchronization in a multithreaded interaction, and adjust for many other types of incompatibilities. Scripting languages can often be used to implement the adapter.
Domain-specific reuse: An approach by Ganesan and Knodel looks at domain-specific reuse [Ganesan 2005a]. The goal of their approach is to reduce the amount of source code that a human expert has to explore to identify domain-specific software components. They outline a 10-step process for component extraction. The basic premise of this approach is that reusable classes have certain quality attributeslike functional usefulness, readability, testability, and so forththat are mapped onto metrics. The approach classifies the domain-specific classes based on the metrics derived. An expert has to validate only a small number of proposed candidates that, if accepted, become the foundations for the reusable components.
Analyzing product line adequacy: The Fraunhofer Product Line Software Engineering (PuLSE) and Fraunhofer Architecture- and Domain-Oriented Reengineering (ADORE) are methods that can be used to analyze existing components to determine their adequacy for use within a product line. Knodel and Muthig report on the application of these methods in an industrial case study [Knodel 2005a]. In that study, several techniques were used to answer a set of questions about the existing components that determine their adequacy. The techniques applied included static architecture evaluation, variability analysis, clone detection, metric computation, and naming and decomposition analysis, as well as review of code comments.
The major risks associated with mining are (1) failure to find the right assets and (2) choosing the wrong assets. Both will result in schedule slippage and wasted staff time. A secondary risk is inadequate support for the mining operation, which will result in a failed operation and the (misguided) impression that mining is not a viable option.
Specific risks associated with mining operations include
- a flawed search: The search for reusable assets may be fruitless, resulting in a waste of time and resources. Or, relevant assets may be overlooked, resulting in a waste of time and resources spent duplicating what already exists. A special case of the latter is when noncode assets are shortsightedly ignored. To minimize both of these risks, build a catalogue of your reusable assets (including noncode assets) and treat it as a core asset of the product line. Doing so will save time and effort next time.
- an overly successful search: There may be too many similar assets, resulting in too much effort spent on analysis.
- fuzzy criteria: The criteria for what to search for should be specific enough to avoid an overly successful search, yet be general enough to include all the viable candidates.
- failure to search for non-software assets: Failure to consider non-software assets in your search-such as specifications, test suites, procedures, budgets, work plans, requirements, and design rationale-will reduce the overall effectiveness of any mining operation.
- inappropriate assets: Assets recovered from a search may appear, at first, to be usable but later turn out to be of inferior quality or unable to accommodate the scope of variation required.
- bad rehabilitation estimates: Initial estimates of the cost of rehabilitation may be inadequate, leading to escalating and unpredictable costs.
Organizational issues leading to mining risks include
- a lack of corporate memory: Corporate memory may not be able to provide sufficient data to analyze or use the software asset effectively.
- inappropriate methods: The wrong reengineering methods and tools may be selected, leading to flawed results and schedule and cost overruns.
- a lack of tools: Tools required for the mining effort may not be integrated to the extent necessary, leading to risky and expensive workarounds.
- turf conflicts: Potential turf conflicts may undermine the decision process when selecting among similar candidate assets. Or, a repository of assets may be off-limits for political or organizational reasons.
- an inability to tap the needed resources: The organization might be unable to free the resources needed to rehabilitate or renovate the asset. Those resources must be freed from the group that originally created the asset.
This paper outlines a 10-step process for component extraction that classifies a set of components based on determining a set of metrics for the classes that underlie those components.
This paper outlines a case study in applying a variety of techniques to determine the adequacy of components for use within a product line. Several techniques were used to answer a set of questions about the components that determine their adequacy.
This book on modernizing legacy systems by Seacord, Plakosh, and Lewis provides guidance on how to implement a successful modernization strategy and specifically describes a risk-managed, incremental approach that encompasses changes in software technologies, engineering processes, and business practices.