A Framework for Software Product Line Practice, Version 5.0
Testing has two main functions: (1) helping to identify faults that lead to failures so they can be repaired and (2) determining whether the software under test can perform as specified by its requirements. In certain domains and styles of development, testing has been performed to estimate the reliability of software.
Since it is almost always impractical to exercise a program against all possible inputs, testing is really a search process. During development, the developer tests by using those inputs that are most likely to result in failures. After the software under test has reached some stage of completion, the system tester searches for those failures the user is most likely to encounter. Not all failures have the same impact on the user. The amount of effort that is expended in searching for these failures should be proportionate to the impact on the quality of the program.
Testing is a continuous activity that cuts across all phases of the software development process. It is also a labor-intensive activity: Estimates of the resources expended writing code for testing purposes range from 40% to as high as 300% or even 500% of the amount expended for all other effort on the application under development [Pressman 1998a, p. 595]. This high cost makes testing an attractive target for improvement.
Different types of testing, such as unit, integration, and system testing, are carried out during the development process. Regardless of the type of testing, each task involved is organized around three basic activities:
analysis: The material to be tested is examined using specific strategies to identify appropriate test cases. Analysis techniques that involve structured artifacts such as architecture description languages and programming languages can be automated to reduce the test resources needed for a project. Performing test analysis will actually detect some defects such as poor testability. The output from this activity is a detailed test plan.
construction: The artifacts needed to execute the tests specified in the test plan are built. These artifacts usually include test drivers, test data sets, and the software that implements the actual tests. Commonalities among products and product parts support the development of frameworks and of harnesses that simplify test construction.
execution and evaluation: The tests are conducted, and the results are analyzed. The software is judged to have passed or failed each test. This information guides decisions about what the next step will be in the development process. The reuse of test cases across pieces of software amortizes the often high cost of test oraclesthose software pieces that determine whether a test has passed or failed.
All testing activities should be carried out under the following desiderata:
- Testing is objective. The process by which criteria are determined should be guided only by the satisfaction of the asset's requirements.
- Testing is systematic. Test criteria are selected according to an algorithm that prescribes a reason for selecting each criterion.
- Testing is thorough. The criteria used should achieve some logical closure that can be viewed as complete by some definition such as touching every line of code or executing every decision point.
- Testing is integral to the development process. Before the software under test is produced, plans should be made as to how best to assess it.
Testing activities produce four major types of artifacts:
- test cases: Selecting test cases is the fundamental test activity. Test cases are designed by setting a goal, achieving a certain level of test coverage, and then analyzing the item being tested to determine how to achieve that coverage. The test approach (e.g., find the highest risk or most likely defects) will direct the tester as test cases are selected. For each test case, the test context, input, and expected result are captured in a test plan, test data sets, and ultimately test software.
- test documents: Test plans and test reports are the two primary types of test documents.
- test data sets: The data needed for a test include all the inputs required to establish the preconditions for a test case and the actual test step. The construction and verification of these data sets require a significant resource investment.
- test software: Test harnesses can be as complicated as the production software. For example, timing a component's response may be necessary to determine whether that component has met its real-time requirements. Or, it may be necessary to populate a large database, execute a test case, and then restore the database to its initial state for the next test case.
The remainder of this overview summarizes different kinds of testing. The first four types serve as exit gates for project phases.
Design model validation: Each phase in the development process that creates a model of the product or some portion of it should include testing activities that verify the syntax of the model and validate it against the required system. The test can serve as the exit criteria for that phase. We use "model" in a broad sense, to refer to non-software assets that represent a product for the purpose of either making predictions about the product implementation or prescribing constraints for other assets. A business case for a product line is a model; it predicts how profitable the product line will be. Software designs are models; they predict behavior and also impose constraints on implementations. Preeminent among the models is the software architecture, and its validation is so important that "Architecture Evaluation" is its own practice area.
Unit testing: Testing for implementation defects begins with the most basic unit of code development. This unit may be a function, class, or component. This kind of testing occurs during coding; therefore, the intention is to direct the testing search to those portions of the code that are most likely to contain faultscomplex control structures, for example. As each unit is constructed, it is tested to ensure that it (1) does everything that its specification claims and (2) does not do anything it should not. A test case associates a set of input values with the result that should be produced by a correctly functioning system. The functional testing strategy uses the specification of the unit to determine which inputs to use in the testing. This strategy provides evidence that the unit does everything it is supposed to. A second strategy, termed structural testing, selects test inputs on the basis of the structure of the code that implements the functionality of the unit. This strategy provides evidence that the unit does not do anything it is not supposed to.
Subsystem integration testing: The integration of basic units, even those that have been adequately unit tested, may produce failures resulting from the interaction of the units. Timing discrepancies and type/subtype relationships can be the source of these errors. The tests are constructed from the use cases used to represent the full product's requirements. The integration test plan should describe tests that have been systematically selected from the interactions among the units being integrated. Protocol descriptions between pairs of units or flows through sets of units that implement a specific pattern of behavior can be used to select the test cases. Test cases should include instances in which the error-handling capability of the units is evaluated, such as when one unit throws an exception that should be caught by another unit.
System integration testing: When some critical mass of subsystems has been fully developed and tested, the focus shifts to representative tests of the completed application as a whole to determine whether a product does what it is supposed to do. These representative tests are selected to cover the complete specification for the portion of functionality that has been produced. The amount of testing a specific function receives is based either on its frequency of use (operational profiles) or on the criticality of the function (risk-based testing). Special forms of system testing include load testing (to determine if the software can handle the anticipated amount of work), stress testing (to determine if the software can handle an unanticipated amount of work), and performance testing (to determine if the software can handle the anticipated amount of work in the required time).
In addition to testing as the exit criteria for process phases, the next five types of tests described are applied to verify certain product properties.
Regression testing: Regression testing is used to ascertain that the software under test that exhibited the expected behavior prior to a change continues to exhibit that behavior after the change. Regression tests are constructed, and periodically applied, to determine whether the software under test remains correct and consistent over time. Regression testing is triggered by changes that affect a predefined scope of assets or that affect certain critical assets. The actual test cases used in regression testing are no different from any other test cases. The regression test suite is a sample of the functional tests from the original test suites administered prior to any changes.
Conformance testing: Conformance testing determines whether the software under test can be used in a specific role in an application. The conformance test set should cover all the required interactions between all the components that will participate in the application.
Acceptance testing: To validate the claims of the manufacturer or provider, the consumer performs acceptance testing. The acceptance test is more realistic than the system test, since the application being tested is sited in the consumer's actual environment.
Deployment testing: Deployment testing is conducted by the development organization prior to releasing the software to customers for acceptance testing. Where acceptance testing focuses on the functionality of the delivered product, deployment testing covers all the unique system configurations on which the product is to be deployed. This testing focuses on the interaction between the product and platform-specific libraries, device drivers, and operating systems. During the deployment testing phase, the application's ability to deploy or install itself is also tested.
Reliability models: Testing is used to estimate the reliability of a software component or system [Musa 1999a]; however, establishing the reliability of a piece of software through testing is a costly process. The test cases are selected based on the expected frequency of use of each product feature.
Aspects Peculiar to Product Lines
Testing in a product line organization examines the core asset software, the product-specific software, the interactions between them, and ultimately the completed products. Responsibilities for testing in a product line organization may be distributed differently than in a single-system development effort [Clements 2002c, p. 130].
Also, unlike single-system development projects, testing is an activity (and producer of artifacts) whose output is reused across a multitude of products. Planning is necessary to take full advantage of the benefits that reuse brings. The following guidelines should help.
Structure the set of testing processes to test each artifact as early as possible: While the product line architects and implementers can focus on one variation point at a time, much of the testing work cuts across multiple variation points resulting in a potential combinatorial explosion of test cases. This potential can be mitigated by testing every artifact in the product line as early as possible in as isolated a context as possible. Doing so reduces the range of defects that must be searched for at each test point, thus greatly reducing the possible combinations.
Structure test artifacts to accommodate the product line variation: The test artifactstest cases, test plans, test harnessesshould be as variable as the software that implements the product. The key to this variability is designing in the necessary variation so that the test artifact can be made to cover the complete range of product variability. Research evidence supports using the same variation mechanism used in the product implementation to implement variation in the test software.
Maintain the test artifacts: Structuring the test software to be used in multiple products reduces the cost of maintaining the test software, since it is easier to identify where to make changes. The development environment already contains tools that work with the application's units and can just as easily be applied to the units of test software. In iterative, incremental development, and in fact any development process that corrects its mistakes, the test code will be executed many times over the span of development.
Structure the testing software for traceability: The structure of the test software should support traceability from the test code itself to the code being tested. As changes are made to the product code, corresponding changes may be required to the test code. So, to maximize traceability, the test software should reflect the product line architecture where possible. Grouping the test code for a software unit from the application in a single unit of test software creates the mapping from test code to source code. For example, in an object-oriented development effort, the test software for a class is grouped within a single class. Where two parts of the product line architecture have a particular relationship, the test code for each of those parts is also related. For example, where one class in the application software inherits from another, the test class for that class inherits from the test class for the parent application class. In object-oriented software, the inheritance relation defines a hierarchy of definitions from abstract to specific in which each subclass adds more specific information to what it inherits. Test plans should be established at each of these levels, and test cases should be designed to have increasing specificity at each level of the hierarchy. The abstract test cases are not applied directly anymore than abstract class definitions are used directly; however, they provide support for the reuse of definitions.
Reuse product line assets for system integration testing: The "Requirements Engineering" and "Architecture Definition" practice areas discuss various approaches to producing use cases and scenarios that describe how the system is intended to work. You should select test cases and inputs for system integration testing based on these descriptions.
Automate regression testing: Sampling original test sets to create regression test suites and their execution should be automated to encourage frequent retesting. The sampling algorithm should be weighted to test all variation points more than regions of commonality. The variation points are where the most changes, and probably the most errors, will occur.
Expand the testing portfolio: In a product line environment, testing is more cost-effective for the development organization, so more types of tests can be run. Stress, performance, and other types of narrowly focused tests can be added cost-effectively to the intended test suite, thereby reducing the possibility of failures in the field.
Application to Core Asset Development
Testing concepts apply to core asset development in two ways. First, testing itself produces core assets. The four categories of artifacts (test cases, test documents, test data sets, and test software) mentioned earlier are core assets of the product line. They will be used for testing multiple products and pieces of those products.
The second way in which testing concepts are applied to core asset development is testing other core assets; this is a key activity in satisfying the quality and reuse goals in a product line effort. Building a component to be reusable is widely recognized to be more costly than a one-off implementation targeted at a specific application, because the component must be designed to handle a wide range of inputs and encompass a more complete set of states. (This applies to non-software assets as well.) This results in additional tests being necessary to achieve adequate test coverage; however, the scale of reuse in a product line effort keeps this testing cost-effective.
Testing non-software core assets: Every asset should be validated as it is created. This includes the business case, scope definition, and requirements model and carries on to the analysis, architecture, and detailed design models. To make the validation of these models more effective, the activity can be structured as a testing activity that embodies the criteria of being objective, systematic, thorough, and integrated. To ensure objective testing, those who created the model should not be testers.
For models built using a nonexecutable notation, the process described above is a review process. First, take full advantage of syntax checkers and other tools that provide some automation to validate certain aspects of the model. Then use the scenario-driven approach in which reviewers manually (with the help of those who created the model) trace through the model to determine the answer that would result if the model could be executed. An alternative is to actually build an executable version of the model.
Testing software core assets: Software core assets include components, or even complete applications, that are intended to be integrated to produce products. Subject each component to a rigorous test during its construction. Examine the component against its specification, but also examine the behavior of the component in integrated situations. Other software core assets are development tools. If they are built or significantly modified by the product line organization, they must be tested. Product line core assets have variation points, implemented perhaps by providing a parameterization mechanism or multiple implementations of an interface. The test plan for an individual component is divided into functional and structural test suites. Functional tests can be used for all the variations. The structural tests must be modified for each different variation as shown in the following figure. For example, in object-oriented systems, the multiple implementations are related to some abstract definition via an inheritance relationship.
Multiple Implementations Lead to Multiple Test Suites
An acceptance test is performed on all the assets being obtained from outside the organization. Designing the acceptance test includes identifying the desired attributes, defining acceptable levels of those attributes, and evaluating the asset to see if it possesses those attributes. Assets such as compilers, other modeling tools from which code will be generated, and component libraries should be tested for accuracy and compatibility before being deployed to the technical staff.
Application to Product Development
Testing is used in two fundamental ways for products: (1) between phases of the development process to verify that what was produced in the last phase is correct and suitable as input to the next phase and (2) to validate a product against its requirements. Validation tests are intended to evaluate correctness relative to requirements.
In a product line effort, the main product development activity is assembling products from core assets. The majority of each product's specification will be defined in documentation generic to the product line. Define a complete set of functional tests for that specification. Some portion of each product's implementation will also be created at the product line level. Even if the functionality has been tested via some mechanism at the product line level, when it is integrated into a specific product, interactions with product-specific functionality can lead to failures. Define a set of interaction tests that will ensure that the additions made by the product developers do not cause failures. The tests used at the product level can be derived from the functionality tests, or possibly test templates, created at the product line level. The mapping between product core assets and testing assets facilitates the reuse of these test assets. The mapping associates the test cases, as well as the test drivers and test data sets, with those requirements that are common across the product line. By taking advantage of this commonality, the amount of effort associated with testing and retesting a product is reduced.
There is a tradeoff between saving resources through the reuse of testing assets and improving quality by expending some of the saved resources on additional testing. By devoting more resources to those products created early in the product line effort, the quality of all deliverables is improved. Since these are the assets that will be reused, this improved quality will be propagated to future versions of the product being constructed and to other products in the product line.
Architecture evaluation: Evaluating the product line architecture is a kind of testing, under the broad sense of the term used in this practice area. Architecture evaluation is covered in its own practice area.
Build support for testing into components: The product line architecture can provide support for testing by levying requirements on the systems' components. This support takes forms such as special test interfaces that allow self-test functionality to be invoked and special access to certain state types that are stored (maintained internally) by the program. These types of interfaces and functionality are often too resource intensive to be provided in a one-off system but are cost-effective in a product line environment.
The self-test functionality provides the system user the capability to determine whether the system is currently operational. This capability is particularly useful in systems that are configurable, systems that dynamically incorporate resources into the product, and systems that have a significant hardware component. The basic support for self-testing can be defined in the product line architecture and then elaborated by specific products. The self-test functionality can run a set of regression test cases that are designed to exercise those parts of the program that dynamically load and link functionality and those parts that rely on information from configuration files and other external resources.
Guided inspection: For analysis and model reviews, guided inspection is a technique that combines the checklist of an inspection with the thoroughness of testing [McGregor 1999a]. The inspection process is "guided" by the test cases. Other methods for guided inspections are discussed in the "Architecture Evaluation" practice area.
Test-Driven Development (TDD): TDD is one practice of the agile development community. Its goal is "clean code that works" [Beck 2002a]. In this practice, developers define requirements for the piece they are assigned to construct by maintaining close communication with the developers who are constructing related units and writing test cases that serve as the specification for the unit. The developer then writes and revises code until the unit passes all the tests. The rhythm of TDD is very short cycles of these steps:
- Define a new test.
- Execute all tests.
- Write code to fix tests that fail.
- Execute all tests.
TDD has the advantage that the test code is always in synch with the product code, because the test code defines the product code. The disadvantage of TDD is that there is not a good method for determining whether the set of test cases is complete, since the completeness of a test set is usually determined by comparing it to the specification.
TDD is applicable to product line organizations provided it is applied to units that are first defined in the context of the product line architecture. TDD does not provide tools and techniques for balancing the diverse quality attributes usually present in a product line. TDD can be successful if applied to units that a small group of developers, often a two-person or pair programming team, can produce in a timely manner. The range of variability for the unit should also be sufficiently narrow to allow for timely completion. The success of TDD depends on the availability of tools, such as JUnit, to assist with development and automate testing.
The major risk in testing is not doing enough of it and not doing it in high-payoff ways. Inadequate testing will result in low software quality, which will undermine the success of the product line. Inadequate validation of non-software artifacts will result in a loss of trust in those artifacts and a decaying of the process-based or documentation-based practices they were intended to support. Specific testing risks include
inadequate unit testing: Component quality will be low if the unit-level testing is inadequate. Often technical staff will decide to "save time" by performing little or no unit-level testing. This may actually take more time, because it will require an unexpected amount of integration and system testing. Further time may be lost, because it is well established that repairing errors found late in the development cycle is more costly than repairing those found early. The probability that this risk will occur is lower in a product line environment that fosters a culture of reuse. However, if the risk becomes a problem, the cost will be far greater. The increased cost is due directly to the propagation of poor-quality components across the larger number of reuse sites. A well-defined software development process that specifies a unit test activity and defines a level of adequate coverage mitigates this risk.
inadequate unit testing due to inadequate tool support: The automated unit testing will be inadequate if it is conducted on application program interfaces (APIs) only. Few of the automated testing tools work on APIs, and those that do usually require some amount of custom programming or comprehensive specifications. The risk is that, if adequate tools are not available, more resources will be needed to achieve an acceptable level of coverage. The probability that this risk will occur is less in a product line environment where the cost of building special tools can be amortized over multiple products. However, if the risk becomes a problem, the cost will be far greater due to the propagation of poor-quality components across the larger number of reuse sites. A tools group at the product line level that provides testing support to all products mitigates this risk.
inadequate specifications: The testability of components will be low if inadequate specifications make it impossible to design tests. The probability that this risk will occur is about the same in a product line environment as it is in a single-system one. However, if the risk becomes a problem, the cost will be far greater in a product line environmenta cost that includes increased resource requirements for ensuring adequate quality. Training developers to write complete, consistent, and correct specifications mitigates this risk.
insufficient integration testing: The flow of products will be slower than expected if sufficient integration testing is not conducted. The probability that this risk will occur will be higher in a product line environment if the product line team and the product teams are not linked in a feedback loop. Internally, the product line team should use the product line architecture as a blueprint for communication links between component development teams to ensure that the interactions between components will be complete and correct.
testing too late: The leverage gained from testing a product line is at its peak when applied early. The later the tests are applied, the more combinations there are to test. By testing early and applying incremental integration tests, far fewer tests will be needed.
testing too much: The test plan must have clear stopping criteria. Experience will yield the usual defect densities (number of defects per line of code). When testing and repair have achieved that level of defect density, further effort may not have a positive return on investment (ROI). In the absence of experience, establish test coverage levels and stop when they are achieved.
inadequate test infrastructure: The anticipated high level of test asset reuse will not be realized unless sufficient resources are devoted to the test infrastructure. If developers are allowed to test in ad hoc ways or the test software architecture is not maintained properly, new tests will not be derived from existing ones. The resulting loss may be a reduction in quality and available resources.
Beck's book introduces TDD.
Beizer provides a comprehensive survey of testing techniques applied at the unit, integration, and system levels. He describes basic techniques regardless of the process or development paradigm. His book serves as good general background.
McGregor provides a jump-start for personnel charged with establishing the testing process for a product line environment. He presents techniques for taking advantage of the personnel organization and software architectures in order to reduce the effort required for adequate testing. The techniques organize unit-level testing assets in a manner that directly reflects the architecture of the product software. The techniques also associate the requirements, in the form of use cases, with the system test cases. The proceedings of the Software Product Line Testing (SPLiT) workshops also provide insight into the research directions in this area [SPLiT 2004a, SPLiT 2005a].
Musa ties the amount of testing to measures of reliability. He describes the computation required to determine the levels of tests that are required to "prove" specific levels of reliability.