NEWS AT SEI
This article was originally published in News at SEI on: March 1, 1999
In an earlier
column, I compared the roles of a software architect and a traditional building
architect. I suggested that this often-used analogy may not be entirely
accurate, and that there may be a more precise analogy available to us in the
emerging field of architectural engineering.
In a subsequent release of SEI
Interactive, Rick Kazman continued contrasting software architecture and
“traditional” forms of architecture and engineering by describing the specific
needs for representing software and system architectures. As noted by Kazman,
Engineering representations in any field serve to support
analysis. Structural engineers, for example, might analyze the strength and
bending properties of the materials that they use to construct a building, to
determine if a roof will withstand an anticipated snow load, or if the walls
will crumble in an earthquake.
A (software) system architecture must describe the system's
components, their connections, their interactions, and the nature of the
interactions between the system and its environment. Evaluating a system design
before it is built is good engineering practice. A technique that allows the
assessment of candidate architectures before the system is built has great
value. As Winnie-the-Pooh would have it, “it would be a very good thing.”
The architecture should include the factors or parameters of
interest for each attribute model. Parameters that are common to more than one
attribute model influence multiple attributes and can be used to trade off
between attributes. For each of the attribute models, we must
identify those parameters that have a major effect on the results for that
model. A sensitive parameter is one that has a great impact on the model.
(Variations in the parameter correlate strongly with variations in the modeled
or measured value.) Sensitive parameters found in only one set may
not have been considered by the other model experts, or may not be relevant or
sensitive to that model. Sensitive parameters that affect more than one
attribute can be positively correlated [i.e., a change in one direction has
positive effects on all attributes (win-win)] or negatively correlated [i.e.,
an improvement in one attribute may result in negative effects on another
A mature software engineering practice would allow a designer
to predict these attributes through changes to the factors found in the
architecture before the system is built. Unfortunately, in contrast to building
architectures, we have yet to agree on what the appropriate software structures
and views should be and how to represent them. One of the reasons for the lack
of consensus on structures, views, and representations is that software quality
attributes have matured (or are maturing) within separate communities, each
with their own vernacular and points of view. For example, we studied the
different schools/traditions concerning the properties of critical systems and
the best methods to develop them [Barbacci 95]:
- performance--from the tradition of hard
real-time systems and capacity planning
- dependability -- from the tradition of
ultra-reliable, fault-tolerant systems
- security -- from the traditions of the
government, banking, and academic communities
- safety -- from the tradition of hazard
analysis and system safety engineering
Systems often fail to meet user needs (i.e., lack quality) when
designers narrowly focus on meeting some requirements without considering the
effect on other requirements or by taking them into account too late in the
development process. For example, it might not be possible to meet
dependability and performance requirements simultaneously:
Replicating communication and computation to
achieve dependability might conflict with performance requirements (e.g., not
enough time). Co-locating critical processes to achieve
performance might conflict with dependability requirements (e.g., single point
of failure). This is not a new problem, and software developers have been
trying to deal with it for a long time, as illustrated by Boehm [Boehm 78]:
Finally, we concluded that calculating and understanding the
value of a single overall metric for software quality may be more trouble than
it is worth. The major problem is that many of the individual characteristics
of quality are in conflict; added efficiency is often purchased at the price of
portability, accuracy, understandability, and maintainability; added accuracy
often conflicts with portability via dependence on word size; conciseness can
conflict with legibility. Users generally find it difficult to quantify their
preferences in such conflict situations.
We should not look for a single, universal metric, but rather
for quantification of individual attributes and for tradeoffs between different
metrics. As we shall see later, identifying shared factors and methods gives us
a good handle on the problem, provided we keep the relationships between
attributes straight, which is not always easy.
Relationships between attributes
Each of the attributes examined has evolved within its own
community. This has resulted in inconsistencies among the various points of
Dependability vis-a-vis safety
The dependability tradition tries to capture all system
properties (e.g., security, safety) in terms of dependability concerns—i.e.,
defining failure as “not meeting requirements”
[Laprie 92]. It can be argued that this is too narrow because
requirements could be wrong or incomplete and might well be the source of
undesired consequences. A system could allow breaches in security or safety and
still be called “dependable.”
The safety-engineering approach explicitly considers the system
context. This is important because software considered on its own might not
reveal the potential for mishaps or accidents. For example, a particular
software error may cause a mishap or accident only if there is a simultaneous
human and/or hardware failure. Alternatively, it may require an environment
failure to cause the software fault to manifest itself.
For example [Rushby 93], a mishap in an air-traffic control
system is a mid-air collision. A mid-air collision depends on several factors:
- The planes are too close.
- The pilots are unaware that the planes are too
- Or the pilots are aware that the planes are too
close, but fail to take effective evading action
or are unable to take effective evading action
- The air-traffic control system cannot be responsible for the
state of alertness or skill of the pilots; all it can do is attempt to ensure
that the planes do not get too close together in the first place.
Thus, the hazard (i.e., the erroneous system state that leads
to an accident) that must be controlled by the air-traffic control system is,
say, “planes getting closer than two miles horizontally or 1,000 feet
vertically of each other.”
Precedence of approaches
Safe software is always secure and reliable--Neumann presents a
hierarchy of reliability, safety, and security [Neumann 86]. Security depends
on reliability (an attribute of dependability), and safety depends on security;
hence, also reliability. A secure system might need to be reliable
because a failure might compromise the system's security (e.g., assumptions
about atomicity of actions might be violated when a component fails). The safety-critical components of a system need
to be secure to prevent accidental or intentional alteration of code or data
that were analyzed and shown to be safe. Finally, safety depends on reliability when the
system requires the software to be operational to prevent mishaps.
Enhancing reliability is desirable, and perhaps necessary, but
it is not sufficient to ensure safety. As Rushby notes [Rushby 93], the
relationships are more complex than a strict hierarchy:
- Fault-tolerant techniques can detect security
violations -- virus detected through N-version programming, intrusions detected
automatically as latent errors, and denial detected as omission or crash
- Fault containment can enhance safety by ensuring
that the consequences of a fault do not spread and contaminate other components
of a system.
- Security techniques can provide fault
containment through memory protection, control of communications, and process
A security kernel can enforce safety using
runtime lockin mechanisms for “secure” states and interlocks to enforce some
order of activities. Kernelization and system interlocks are primarily
mechanisms for avoiding certain kinds of failure and do very little to ensure
- A kernel can achieve influence over higher
levels of the system only through the facilities it does not provide--if a kernel provides no mechanism for achieving
certain behaviors, and if no other mechanisms are available, then no layers
above the kernel can achieve those behaviors.
The kinds of behaviors that can be controlled in
this way are primarily those concerning communication, or the lack thereof.
Thus, kernelization can be used to ensure that certain processes are isolated
from each other, or that only certain interprocess communication paths are
available, or that certain sequencing constraints are satisfied.
Kernelization can be effective in avoiding
certain faults of commission (doing what is not allowed) but not faults of
omission (failing to do what is required)—that is, a security kernel cannot
ensure that the processes correctly perform the tasks required of them.Applicability of approaches
The methods and mindset associated with each of the attributes
that we examined [Barbacci 95] have evolved from separate schools of thought.
Yet there appear to be common underpinnings that can serve as a basis for a
more unified approach for designing critical systems. For example
- Safety and dependability are concerned with
detecting error states (errors in dependability and hazards in safety) and
preventing error states from causing undesirable behavior (failures in
dependability and mishaps in safety).
- Security and performance are concerned with
resource management (protection of resources in security and timely use of
resources in performance).
The applicability of methods developed for one attribute to
another attribute suggests that differences between attributes might be as much
a matter of sociology as technology. Nevertheless, an attribute-specific
mindset might be appropriate under certain circumstances. Examples include the
The dependability approach is more attractive in
circumstances for which there is no safe alternative to normal service—a
service must be provided (e.g.,
The safety approach is more attractive where
there are specific undesired events--an accident must
be prevented (e.g., nuclear power plant).
The security approach is more attractive when
dealing with faults of commission rather than omission--service must not be denied, information must not
This is not to suggest that other attributes could be ignored.
Regardless of what approach is chosen, we still need a coordinated methodology
to look at all of these attributes together in the context of a specific
design. For example, all the attributes that we examined [Barbacci 95] seem to
share classes of factors. There are events (generated internally or coming from
the environment) to which the system responds by changing its state. These
state changes have future effects on the behavior of the system (causing
internal events or responses to the environment).
The “environment” of a system is an enclosing “system,” and
this definition applies recursively, up and down the hierarchy. For example
varying arrival patterns (events) cause system overload (state) that leads to
jitter (event); faults (events) cause errors (state) that lead to failure
(events); hazards (events) cause safety errors that lead to mishaps (events);
intrusions (events) cause security errors that lead to security breaches
Architecture patterns are the building blocks of software architectures.
Examples of patterns include pipes and filters, clients and servers, token
rings, blackboards, etc. The architecture of a complex system is likely to
include instances of more than one of these patterns, composed in arbitrary
ways. Collections of architecture patterns should be evaluated in terms of
quality factors and concerns, in anticipation of their use. That is, it is
conceivable that architecture patterns could be “pre-scored” to gain a sense of
their relative suitability to meet quality requirements should they be used in
In addition to evaluating individual patterns, it is necessary
to evaluate compositions of patterns that might be used in an architecture.
Identifying patterns that do not “compose” well (i.e., the result is difficult
to analyze or the quality factors of the result are in conflict with each
other) should steer a designer away from “difficult” architectures, toward
architectures made of well-behaved compositions of patterns.
In the end, we will need both quantitative and qualitative
techniques for evaluating patterns and architectures. Quantitative techniques
include various modeling and analysis techniques, including formal methods.
Scenarios are rough, qualitative evaluations of an architecture; scenarios are
necessary but not sufficient to predict and control quality attributes and have
to be supplemented with other evaluation techniques (e.g., queuing models,
schedulability analysis). Architecture evaluations using scenarios will be the
subject of a future column.
[Barbacci 95] Barbacci, M.R.; Klein, M.H.; Longstaff,
T.A.; & Weinstock, C.B. Quality
Attributes (CMU/SEI-95-TR-021, ADA307888). Pittsburgh, Pa.: Software
Engineering Institute, Carnegie Mellon University, 1995.
[Boehm 78] Boehm,
B. et al. Characteristics of Software
Quality. New York: American Elsevier, 1978.
[Laprie 92] Laprie,
J.C. (ed.). Dependable Computing and
Fault-Tolerant Systems. Vol. 5, Dependability:
Basic Concepts and Terminology in English, French, German, Italian, and
Japanese. New York: Springer-Verlag, 1992.
[Neumann 86] Neumann,
P.G. “On Hierarchical Design of Computer Systems for Critical Applications.” IEEE Transactions on Software Engineering 12, 9 (September 1986): 905-920.
[Rushby 93] Rushby,
J. Critical System Properties: Survey and
Taxonomy (Technical Report CSL-93-01). Menlo Park, Ca.: Computer Science
Laboratory, SRI International, 1993.
About the author
Mario Barbacci is a senior member of the technical staff at the
SEI. He was one of the founders of the SEI, where he has served in several
technical and managerial positions, including project leader (Distributed
Systems), program director (Real-Time Distributed Systems, Product Attribute
Engineering), and associate director (Technology Exploration Department).
Before coming to the SEI, he was a member of the faculty in the School of
Computer Science at Carnegie Mellon.