|
Complete
The demand for increased efficiency and effectiveness of our
software processes places measurement demands on the software
engineering community beyond those traditionally practiced.
Statistical and process thinking principles lead to the use of
statistical process control methods to determine the consistency and
capability of the many processes used to develop software.
Over the past decade, the concepts, methods, and practices
associated with process management and continual improvement have
gained wide acceptance in the software community. These concepts,
methods, and practices embody a way of thinking, a way of acting, and
a way of understanding the data generated by processes that
collectively result in improved quality, increased productivity, and
competitive products. The acceptance of this "process thinking"
approach has motivated many to start measuring software processes
that are responsive to questions relating to process performance
[Florac
99]. In that vein, traditional software measurement and
analysis methods of measuring "planned versus actual" is not
sufficient for measuring process performance or for predicting
process performance. The time has come to marry, if you will,
"process thinking" with "statistical thinking."
"Statistical thinking" [Britz 97] embraces three
principles
- all work occurs in a system of interconnected processes
- variation exists in all processes
- understanding and reducing variation are keys to success
If we examine the basis for these "process thinking" and
"statistical concepts", we find that they are founded on the
principles of statistical process control. These principles hold that
by establishing and sustaining stable levels of variability,
processes will yield predictable results. We can then say that the
processes are under statistical control. Controlled processes are
stable processes, and stable processes enable you to predict results.
This in turn enables you to prepare achievable plans, meet cost
estimates and scheduling commitments, and deliver required product
functionality and quality with acceptable and reasonable consistency.
If a controlled process is not capable of meeting customer
requirements or other business objectives, the process must be
improved or retargeted.
When we relate these notions of process and statistical thinking
to the operational level, we realize a key concern of process
management is that of process performance &endash; how is the process
performing now (effectiveness, efficiency), and how can it be
expected to perform in the future? In the context of obtaining
quantified answers to these questions, we can address this issue by
deconstructing the question of process performance into three
parts.
First we should examine process performance in terms of
compliance. For example, is the process being executed properly? Is
the personnel trained? Are the right tools available? If the process
is not in compliance, we know there is little chance of it performing
satisfactorily.
If a process is compliant, the next question is: Is the process
performance (execution) reasonably consistent over time? Is the
effort, cost, elapsed time, delivery, and quality consumed and
produced by executing the process consistently? Realizing that
variation exists in all processes, is the variation in process
performance predictable?
Finally, if the process performance is consistent, we ask the
question: Is the process performing satisfactorily? Is it meeting the
needs of interdependent processes and/or of the needs of the
customers? Is it effective and efficient?
Historically, software organizations have addressed the question
of compliance by conducting assessments, such as comparing the
organizations' software process against a standard (e.g., the CMM).
Such an assessment provides a picture of the process status at a
point in time and indicates the organization's capacity to execute
various software processes according to the standard's criteria.
However, it does not follow that the process is executed consistently
or efficiently merely because the assessment results satisfied all
the criteria.
The questions of process consistency, effectiveness, and
efficiency require a measurement of process behavior as it is
executed over time. Other disciplines have addressed this issue by
using statistical process control methods, specifically using
Shewhart control charts. They have concluded that control charts
provide the basis for making process decisions and predicting process
behavior.
Successful use of control charts by other disciplines suggest it
is time to examine how statistical process control techniques can
help to address our software process issues. In so doing, we find
that Shewhart's control charts provide a statistical method for
distinguishing between variation caused by normal process operation
and variation caused by anomalies in the process. Additionally,
Shewhart's control charts provide an operational definition for
determining process stability or consistency and predictability as
well as quantitatively establishing process capability to meet
criteria for process effectiveness and efficiency.
We use the term software process to refer not just to an
organization's overall software process, but to any process or
subprocess used by a software project or organization. In fact, a
good case can be made that it is only at subprocess levels that true
process management and improvement can take place. Thus, we view the
concept of software process as applying to any identifiable activity
that is undertaken to produce or support a software product or
service. This includes planning, estimating, designing, coding,
testing, inspecting, reviewing, measuring, and controlling, as well
as the subtasks and activities that comprise these undertakings.
Process Performance Variation
The basis for control charts is recognition of two types of
variation: common cause variation and assignable cause variation.
Common cause variation is variation in process performance due to
normal or inherent interaction among the process components (people,
machines, material, environment, and methods). Common cause variation
of process performance is characterized by a stable and consistent
pattern over time, as illustrated in Figure 1. Variation in process
performance due to common cause is thus random, but will vary within
predictable bounds. When a process is stable, the random variations
that we see all come from a constant system of chance causes. The
variation in process performance is predictable, and unexpected
results are extremely rare.

Figure 1: The Concept of Controlled Variation
The key word in the paragraph above is "predictable." Predictable
is synonymous with "in control."
The other type of variation in process performance is due to
assignable causes. Assignable cause variation has marked impacts on
product characteristics and other measures of process performance.
These impacts create significant changes in the patterns of
variation. This is illustrated in Figure 2, which we have adapted
from Wheeler and Chambers [Wheeler
92]. Assignable cause variations arise from events that are
not part of the normal process. They represent sudden or persistent
abnormal changes to one or more of the process components. These
changes can be in things such as inputs to the process, the
environment, the process steps themselves, or the way in which the
process steps are executed. Examples of assignable causes of
variation include shifts in the quality of raw materials,
inadequately trained people, changes to work environments, tool
failures, altered methods, failures to follow the process, and so
forth.

Figure 2: The Concept of Uncontrolled or Assignable Cause
Variation
When all assignable causes have been removed and prevented from
reoccurring in the future so that only a single, constant system of
chance causes remains, we have a stable and predictable process.
Stability of a process with respect to any given attribute is
determined by measuring the attribute and tracking the results over
time. If one or more measurements fall outside the range of chance
variation, or if systematic patterns are apparent, the process may
not be stable. We must then look for the causes of deviation, and
remove any that we find, if we want to achieve a stable and
predictable state of operation.
When a process is stable, 99+% of process performance variation
will fall within 3 sigma of the mean or average of the variation.
When the process variation falls outside of the 3 sigma limits, the
variation is very likely caused by an anomaly in the process.
When a process is stable, or nearly so, the 3 sigma limits
determine the amount of variation that is normal or natural to the
process. This is the "voice of the process" or the process telling us
what it is capable of doing. This may or may not be satisfactory to
the customer: if it is, it is "capable"; if it is not, the process
must be changed since we know that the remaining variation is due to
the process itself.
Three Important Factors
Before we look at an example, there are three important notions
that should be discussed
- the importance of operational definitions
- homogeneity
- issues of rational subgrouping
The need for operational definitions is fundamental to any
measurement activity. It is not enough to identify measures. Measures
must be defined in such a way as to tell others exactly how each
measure is obtained so that they can collect and interpret the values
correctly.
The primary issue is not whether a definition for a measure is
correct, but that everyone understands, completely, what the measured
values represent. Only then can people be expected to collect values
consistently and have others interpret and apply the results to reach
valid conclusions.
Communicating clear and unambiguous definitions is not easy.
Having structured methods for identifying all the rules that are used
to make and record measurements can be very helpful in ensuring that
important information does not go unmentioned. When designing methods
for defining measures, one should keep in mind that things that do
not matter to one user are often important to another. This means
that measurement definitions (and structures for recording the
definitions) often become larger and more encompassing than the
definitions most organizations have traditionally used. This is all
the more reason to have a well-organized approach. Definition focuses
on details, and structured methods help ensure that all details get
identified, addressed, and recorded. They also help negotiating with
people who believe that attention to detail is no longer their
responsibility.
Operational definitions must satisfy two important criteria
[Park 92]
- communication. If someone uses the definition as a basis for
measuring or describing a measurement result, will others know
precisely what has been measured, how it was measured, and what
has been included and excluded?
- repeatability. Could others, armed with the definition, repeat
the measurements and get the same results?
These criteria are closely related. In fact, if you can't
communicate exactly what was done to collect a set of data, you are
in no position to tell someone else how to do it. Far too many
organizations propose measurement definitions without first
determining what users of the data will need to know about the
measured values in order to use them intelligently. It is no
surprise, then, that measurements are often collected inconsistently
and at odds with users' needs. When it comes to implementation, rules
such as, "Count all noncomment, nonblank source statements" or "Count
open problems" are open to far too many interpretations to provide
repeatable results
Although communicating measurement definitions in clear,
unambiguous terms requires effort, there is good news as well. When
someone can exactly describe what has been collected, it is easy to
turn the process around and say, "Please do that again." Moreover,
you can give the description to someone else and say, "Please use
this as your definition, but with these changes." In short, when we
can communicate clearly what we have measured, we have little trouble
creating repeatable rules for collecting future data.
Next, the notions of homogeneity and rational subgrouping need to
be understood and addressed. Homogeneity and rational subgrouping go
hand in hand. Because of the non-repetitive nature of software
products and processes, some believe it is difficult to achieve
homogeneity with software data. The idea is to understand the
theoretical issues and at the same time, work within some practical
guidelines. We need to understand what conditions are necessary to
consider the data homogeneous. When more than two data values are
placed in a subgroup, we are making a judgement that these values are
measurements taken under essentially the same conditions, and that
any difference between them is due to natural or common variation.
The primary purpose of homogeneity is to limit the amount of
variability within the subgroup data. One way to satisfy the
homogeneity principle is to measure the subgroup variables within a
short time period. Since we are not talking about producing widgets
but software products, the issue of homogeneity of subgroup data is a
judgement call that must be made by one with extensive knowledge of
the process being measured.
The principle of homogeneously subgrouped data is important when
we consider the idea of rational subgrouping. That is, when we want
to estimate process variability, we try to group the data so that
assignable causes are more likely to occur between subgroups than
within them. Control limits become wider and control charts less
sensitive to assignable causes when containing non-homogeneous data.
Creating rational subgroups that minimize variation within subgroups
always takes precedence over issues of subgroup size.
Using Control Charts
Now let's examine how control charts can be used to investigate
process stability and lead to process improvement. There are a number
of different kinds of control charts (please see [Florac
99] for a more detailed discussion on this and other topics).
In software environments, measurements often occur only as individual
values. As a result, there may be a preference to using the
individuals and moving range (XmR) charts to examine the
time-sequenced behavior of process data.
For example, the figure below shows an XmR control chart for the
number of reported but unresolved problems backlogged over the first
30 weeks of system testing. The chart indicates that the problem
resolution process is stable, and that it is averaging about 20
backlogged problems (the center line, CL, equals 20.4), with an
average change in backlog of 4.35 problems from week to week. The
upper control limit (UCL) for backlogged problems is about 32, and
the lower control limit (LCL) is about 8. If future backlogs were to
exceed these limits or show other forms of nonrandom behavior, it
would be likely that the process has become unstable. The causes
should then be investigated. For instance, if the upper limit is
exceeded at any point, this could be a signal that there are problems
in the problem-resolution process. Perhaps a particularly thorny
defect is consuming resources, causing problems to pile up. If so,
corrective action must be taken if the process is to be returned to
its original (characteristic) behavior.

Figure 3: Control Chart for the Backlog of Unresolved
Problems
We must be careful not to misinterpret the limits on the
individual observations and moving ranges that are shown in the
control chart. These limits are estimates for the limits of the
process, based on measurements of the process performance. The
process limits together with the center lines are sometimes referred
to as the "voice of the process."
The performance indicated by the voice of the process is not
necessarily the performance that needs to be provided to meet the
customer's requirements. If the variability and location of the
measured results are such that the processdoes not meet the customer
requirement or specification (e.g., produces too many nonconforming
products), the process must be improved. This means reducing the
process performance variability, moving the average, or both.
1. When analyzing process performance data, all sources of
variation in the process must be identified. If a conscious effort is
not made to account for the potential sources of variation,
variations that could help to improve the process might inadvertantly
be hidden or obscured. Even worse, it could lead to a faulty
analysis. When data are aggregated, the results will be particularly
susceptible to overlooked or hidden sources of variation. Overly
aggregated data come about in many ways, but the most common causes
are
- inadequately formulated operational definitions of product and
process measures
- inadequate description and recording of context
information
- lack of traceability from data back to the context from where
it originated
- working with data whose elements are combinations (mixtures)
of values from non-homogeneous sources or different cause
systems
Overly aggregated data easily lead to:
- difficulty in identifying instabilities in process
performance
- difficulty in tracking instabilities to assignable causes
- using results from unstable processes to draw inferences or
make predictions about capability or performance
- anomalous process behavior patterns
2. When measured values of continuous variables have insufficient
granularity (i.e., are coarse and imprecise), the discreteness that
results can mask the underlying process variation. Computations for
and sigma can then be affected, and individual values that are
rounded or truncated in the direction of the nearest control limit
can easily give false out-of-control signals.
There are four main causes of coarse data: inadequate measurement
instruments, imprecise reading of the instruments, rounding, and
taking measurements at intervals that are too short to permit
detectable variation to occur. When measurements are not obtained and
recorded with sufficient precision to describe the underlying
variability, digits that contain useful information will be lost. If
the truncation or rounding reduces the precision in recorded results
to only one or two digits that change, the running record of measured
values will show only a few levels of possible outcomes.
3. Control charts can be used to serve many different purposes.
Control charts can be helpful for monitoring processes from release
to release to compare overall performance. They can be used for
making process adjustments to ensure that stability is maintained for
a process on a daily or weekly basis. Most importantly control charts
may be used for continuous improvement of a process that is stable
and capable. It is important to keep in mind however, that the
control charts provide the most value to the people or team where the
process knowledge resides.
Management can also help set the example of how not to use the
control charts. While the control charts can be used to improve
personal performance, management should not misuse this tool or the
data. Management has to remember that the old saw "we will continue
the beatings until morale improves," comes into play whenever
measurements are used as part of the "beating." Clearly,
dysfunctional behavior is likely to occur if employees perceive that
measurements are being used in this way
There is evidence that Shewhart's control charts can play a
significant role in measuring process performance consistency, and
process predictability . Successful implementers of this process
recognize the importance to1) understand the concepts of variation,
data homogeneity, common cause systems, and rational subgrouping, and
2) fully understand the process and subprocesses being measured.
Furthermore, they have used the control charts to measure process
performance at the subprocess (and lower) level realizing that there
is far too much variation in the overall process to be helpful in
identifying possible actions for improvement.
These software organizations have come to appreciate the value
added when control charts are used to provide engineers and managers
with quantitative insights into the behavior of their software
development processes. In many ways the control chart is a form of
instrumentation. Much like an oscilloscope, a temperature probe, or a
pressure gauge, it provides data to guide decisions and judgements by
process knowledgeable engineers and managers.
While SPC is not a new technology, (i.e., this technique has been
applied in manufacturing for years) it is just recently being applied
to address software engineering improvement. Organizations are
starting to become aware of SPC, getting appropriate training, and
starting to apply SPC. To get started, many organizations are
analyzing inspection data using SPC.
|
[Austin 96]
|
Robert D. Austin, Measuring and
Managing Performance in Organizations, Dorset House
Publishing, ISBN: 0-932633-36-6, New York, NY,
1996.
|
|
[Basili 92]
|
V.R. Basili, Software Modeling and
Measurement: The Goal/Question/Metric Paradigm,
University of Maryland, CS-TR-2956, UMIACS-TR-92-96,
1992.
|
|
[Brassard 94]
|
Michael Brassard and Diane Ritter, The
Memory Jogger II, GOAL/QPC, Methuen, MA,
1994.
|
|
[Burr 96]
|
Adrian Burr and Mal Owen, Statistical
Methods for Software Quality, ISBN 1-85032-171-X,
International Thomson Computer Press, Boston, MA,
1996.
|
|
[Deming 86]
|
W. Edwards Deming, Out of the
Crisis, MIT Center for Advanced Engineering Study,
Cambridge, MA, 1986.
|
|
[Florac 99]
|
William A. Florac and Anita D. Carleton,
Measuring the Software Process: Statistical Process
Control for Software Process Improvement, Addison
&endash;Wesley, 1999.
|
|
[Hare 95]
|
Lynne B. Hare, Roger W. Hoerl, John D.
Hromi, and Ronald D. Snee, The Role of Statistical
Thinking in Management, ASQC Quality Progress, Vol. 28,
No. 2, February 1995, pp. 53-60.
|
|
[Humphrey 95]
|
Watts S. Humphrey, A Discipline for
Software Engineering, ISBN 0-201-54610-8, Addison-Wesley
Publishing Company, Reading, MA, 1995.
|
|
[Ishikawa 86]
|
K. Ishikawa, Guide to Quality
Control, Asian Productivity Organization, Tokyo, Japan,
(available from Unipub - Kraus International Publications,
White Plains, NY) 1986.
|
|
[Paulk 95]
|
Carnegie Mellon University, Software
Engineering Institute (Principal Contributors and Editors:
Mark C. Paulk, Charles V. Weber, Bill Curtis, and Mary Beth
Chrissis), The Capability Maturity Model: Guidelines for
Improving the Software Process, ISBN 0-201-54664-7,
Addison-Wesley Publishing Company, Reading, MA,
1995.
|
|
[Wheeler 92]
|
Donald J. Wheeler and David S. Chambers,
Understanding Statistical Process Control, Second
Edition, SPC Press, Knoxville, TN, 1992.
|
|
[Wheeler 98]
|
Donald J. Wheeler and Sheila R. Poling,
Building Continual Improvement: A Guide for Business,
SPC Press, Knoxville, TN, 1998.
|
Anita Carleton, SEI
Bill Florac, SEI
February 28, 2001: Original
The Software
Engineering Institute (SEI) is a federally funded research and
development center sponsored by the U.S. Department of Defense
and operated by Carnegie Mellon University.
Copyright
2007
by Carnegie Mellon University
Terms of Use
URL: http://www.sei.cmu.edu/str/descriptions/spc_body.html
Last Modified: 11 January 2007
|