Using Quality Attribute Workshops to Evaluate Early-Stage Architecture Design Decisions

NEWS AT SEI

Author

Mario R. Barbacci

This library item is related to the following area(s) of work:

Software Architecture

This article was originally published in News at SEI on: June 1, 2001

The architectural decisions made in the beginning of the design process, by and large, determine the software's quality attributes in the end. These architectural design decisions are the hardest to change after the system has been implemented. Therefore, they are the most important to get right. To help architects and other stakeholders evaluate the implications of early-stage design decisions, the Software Engineering Institute has developed the Quality Attribute Workshop (QAW) method. This approach uses test cases to examine an architecture's ability to achieve desired attributes. It represents a cost-effective and efficient means of exploring the impact of architectural decisions before they are made.

In our March 2000 column we presented our ideas for the QAW method and our early experiences from conducting one workshop. Since then we have modified the method based on our findings from several workshops. This column presents a more refined view of the QAW method.

Quality attributes are interdependent: performance affects modifiability, availability affects safety, security affects performance, and everything affects cost. Design evaluations can help architects explore the impact of decisions on the quality attributes of their software systems. Ideally, these assessments should take place as early in the architecture-development stage as possible. The SEI has developed the Quality Attribute Workshop (QAW) method, which uses test cases to assess an architecture for quality attributes early in the design process. (For a more complete definition of the QAW method, and how it differs from the Architecture Tradeoff Analysis Method, see the sidebar QAW and ATAM: How They Differ.)

The QAW process can be applied before the architecture has been completely designed. It can be used on a wide range of architectural representations. It also requires relatively little time and effort on the part of QAW participants. Yet the QAW method can help designers ensure that the resulting architecture will deliver the quality attributes that the system and its stakeholders require.

The QAW Process

The QAW process is composed of four steps. As shown in Figure 1, these steps are: (1) scenario generation, where we generate, prioritize, and refine scenarios; (2) test case development; (3) test case analysis, in which test cases are analyzed against the architecture; and (4) results presentation, in which results are presented to other stakeholders.

Figure 1: The QAW Process

Figure 1: The QAW Process

To understand the power and the potential of the QAW, let's examine each step in detail.

Step 1: Scenario Generation Workshop

At the initial meeting, SEI facilitators and stakeholder representatives conduct a brainstorming session. Participants suggest scenarios or ask questions about the way in which the architecture will respond to various situations. They are encouraged to generate as many scenarios as possible in one to two hours. Usually, participants generate 30 to 40 scenarios.

Because only a small number of scenarios can be refined during a typical one-day workshop, the QAW participants must prioritize their choices by voting on the scenarios. The refining activity elicits details such as the expected operational consequences, the system assets involved, the end-users involved, the potential effects of the scenario on system operation, and the exceptional circumstances that may arise. See the sidebar for an example of a scenario refinement.

Step 2: Test Case Development

Each refined scenario is transformed into a test case consisting of (1) a context section, (2) an issues and questions section, and (3) a utility tree. See the sidebar for an example test case.

The context section describes the mission, the assets involved, the geographical region, the operational setting, and the players. It also describes the operation over some time period. For example, a general scenario may be “system responds to failure of a communication relay device.”

The test case for the failure scenario should describe the operation at the time of failure

what happens when the system reacts to the failure degraded operation that occurs while repair is underway the process of restoring the system to normal operation

The issues section defines various architectural concerns associated with the context and proposes questions that connect these issues to quality attributes. For example, the issue may be: “How is failure detected?” The questions may be: “What subsystem detects the failure? How long does it take to detect the failure? And, what happens during this interval?” In the failure context, it is relatively straightforward to discuss the following:

  • performance issues, such as time to detect failure
  • availability issues, such as degraded mode of services provided
  • interoperability issues, such as how an alternative service might be introduced
  • security issues, such as the impact on data integrity

Finally, the test case incorporates a utility tree. It graphically links quality attributes to specific attribute issues, and then to specific questions as illustrated in the sidebar example.

Step 3: Test Case Analysis

The analysis should be done at the highest reasonable level of abstraction, but should include components that make sense under the test case circumstances. In the example mentioned above, the architecture team might respond in one of two ways:

The team could present a sequence diagram that includes an architectural component labeled “traffic manager” that diverts the message traffic, provides details on the level of degradation of the network traffic during the failure, and also includes a load-shedding component.

Alternatively, the team’s sequence diagram could show a “control center” that contains a “traffic manager” and a “load shedder” that will take the appropriate actions.

Obviously the first case includes sufficient detail to evaluate the problem. The second case is much more of a “trust me” statement; one that causes problems in the future. As mentioned earlier, this is an iterative process. The team may refine the architecture or it may continue analyzing scenarios.

Once the team members are satisfied with the results, they should document the answers, then proceed to Step 4.

Step 4: Results Presentation Workshop

This workshop presents the test case analyses. It is an opportunity for the architecture team members to demonstrate that (1) they completely understand the test cases, (2) their architecture is able to handle these cases correctly, and (3) they have the competence to continue analyzing important test cases as part of an architecture development effort.

Experience

The SEI has performed a number of quality attribute workshops. Among the lessons learned were:

The first QAW workshop (which generates, prioritizes, and refines scenarios) is a useful communications forum. Often stakeholders are unfamiliar with what other stakeholders are doing or are unaware of considerations brought up by those responsible for maintenance, operations, or acquisition.

The approach depends on getting a number of important stakeholders at the workshop.

In most cases, the workshops discovered undocumented requirements. They were revealed when workshop scenarios were checked against system requirements.

The scenarios helped ensure that projected deployment of assets and capabilities matched the scenarios and test cases.

In conclusion, the QAW can help architects explore early-stage architectural design decisions. At this point, SEI facilitators are continuing to hold QAW workshops with additional sponsors, in different application domains, and at different levels of detail. The approach looks promising; the concept of testing out flaws in the architecture should reduce rework in building the system and improve its ability to deliver required quality attributes.

ATAM and QAW: How They Differ

For the past two years the SEI has been developing the Architecture Tradeoff Analysis MethodSM (ATAMSM) for evaluating system architectures based on a set of quality attributes, such as modifiability, security, performance, and reliability. The ATAM process typically takes three days and the involvement of 10-20 people, including evaluators, architects, and other system stakeholders. The effectiveness of ATAM depends on having a concrete, well-defined architecture to be analyzed. However, some ATAM benefits can be achieved even if an architecture is not fully defined. Under some circumstances, an organization might wish to identify potential architecture risks while developing a system's architecture.

For this reason, the SEI has developed the Quality Attribute Workshop (QAW) method, in which architects, developers, users, maintainers, and other system stakeholders, such as people involved in installation, deployment, logistics, planning, and acquisition, carry out several ATAM steps, but focus on system requirements and quality attributes, rather than on the architecture. The objective of the workshop is to identify sensitivities, tradeoffs, and risks and use these as early warnings to the architecture developers.

The QAW and the ATAM approach differ in substantial ways. In ATAM, the scenario generation and analysis happen in real-time, during stakeholder meetings, typically one or two days long. In QAW, the scenario generation takes place in a similar venue but the analysis is carried out off-line and is presented to the stakeholders weeks or even months later. In the ATAM case the developers already have an architecture and have made a number of architectural decisions, thus they are able to conduct the analysis of a scenario in a matter of minutes. In the QAW case, there might not be an architecture, the architecture team might still be pondering a number of decisions, or the team members might not be aware that they might have to worry about some risk or another. Thus, in QAW the scenarios, plus any additional refinements obtained from the stakeholders during the scenario-generation meeting, are converted into "test cases" with additional details and specific questions to be answered by the analysts. The process provides the architecture team with time to conduct the analysis, to make changes and try alternative approaches, and to document their decisions and responses to the test case questions without the time pressure of the ATAM process.

A QAW Test Case

In the following example, a general scenario is turned into a test case. Note that while several quality attributes could be considered, this example only illustrates questions concerning two attributes, performance and availability. For a more detailed example analysis of this test case see http://www.sei.cmu.edu/publications/

documents/00.reports/00tn010.html?si

In a recent activity conducted for a government agency, participants generated the scenario “Mars orbital communications-relay satellite fails.”   The QAW participants refined the scenario by explaining the circumstances associated with it:

  • one of three aero-stationary satellites fails
  • reports failure to Earth control element
  • Mars surface elements and Mars satellites know that it failed
  • service assessment done at control center (two days)
  • traffic rerouting is to be performed
  • network reconfiguration dictated by flight director, perhaps postponed to limit possibility of further failure
  • multiple authorities in multiple organizations and control centers
  • well defined decision-making process leading to mission director (final authority)
  • multiple missions will be running simultaneously; coordination is complex

With the following additional details the scenario can be turned into a test case:

Context.  Human and robotic missions are present on the Mars surface when the power amplifier fails on one of three stationary satellites. The primary communications payload is disabled for long-haul functions but the proximity link to other relay satellites and customers in orbit and on the surface is still working. Secondary telemetry and tele-command for spacecraft health is still working for direct-to-Earth with a low data rate. The remaining two satellites are fully functional. Communications with the crew has been interrupted. The crew is not in an emergency situation at the time of the failure, but reconnection is needed “promptly.” The crew on the surface is concentrated in one area and the other missions in the Mars vicinity are in normal operations, are working on non-emergencies, or are performing mission-critical events. The communications network is well developed.

Stimulus. Detection of failure: Power amplifier failed, disabling the long-haul functions. The proximity link to other relay satellites and customers in orbit and on the surface is still working.

Quality Attribute Issues and Questions.

The test case lists a number of questions to be answered by the analysis.

  1. Issue: Mission safety requires consistent and frequent communications between the crew and Earth (P, A)
    • Question: How long does it take to detect the failure?
    • Question: How long does it take to reconfigure the system to minimize the time the crew is without communication?
  2. Issue: System operation will be degraded (P, A)
    • Question: Is there a way for the customer to simplify procedures in order to handle a larger number of missions with less trouble than they now have coordinating two missions?
    • Question: What redundancy is required?
    • Question: Is there a way to send information about the degraded satellite back to Earth for analysis?
  3. Issue: System recovery (P, A)
    • Question: Can the crew participate in the repair?
    • Question: Is there any expectation for a human interface between Mars and Earth (e.g., with the crew in the space station)?
    • Question: Can the customer participate in the notification (e.g., “Please send a message to the other satellite”)?

Utility Tree

Quality Attribute

Specific Attribute Issue

Question

Performance

…of communications…

(1b) how long to reconfigure?

 

…degraded operation…

(2a) can decisions be simplified?

 

 

(2b) how is information sent back?

Availability

…mission safety…

(1a) How long to detect the failure?

 

…redundancy…

(2b) What redundancy is required?

 

…recovery…

(3a) can the crew help?

 

 

(3b) can space station help?

(3c) can other assets help?

 

About the Author

Mario Barbacci is a senior member of the technical staff at the SEI. He was one of the founders of the SEI, where he has served in several technical and managerial positions, including project leader (Distributed Systems), program director (Real-Time Distributed Systems, Product Attribute Engineering), and associate director (Technology Exploration Department). Before coming to the SEI, he was a member of the faculty in the School of Computer Science at Carnegie Mellon. Barbacci is a fellow of the Institute of Electrical and Electronic Engineers (IEEE), a member of the Association for Computing Machinery (ACM), and a member of Sigma Xi. He was the founding chairman of the International Federation for Information Processing (IFIP) Working Group 10.2 (Computer Descriptions and Tools) and has served as vice president for technical activities of the IEEE Computer Society and chair of the Joint IEEE Computer Society/ACM Steering Committee for the Establishment of Software Engineering as a Profession. He was the 1996 president of the IEEE Computer Society. He was the 1998-1999 IEEE Division V Director.

Barbacci is the recipient of several IEEE Computer Society Outstanding Contribution Certificates, the ACM Recognition of Service Award, and the IFIP Silver Core Award. Barbacci received bachelor's and engineer's degrees in electrical engineering from the Universidad Nacional de Ingenieria, Lima, Peru, and a doctorate in computer science from Carnegie Mellon.

 

Find Us Here

Find us on Youtube  Find us on LinkedIn  Find us on twitter  Find us on Facebook

Share This Page

Share on Facebook  Send to your Twitter page  Save to del.ico.us  Save to LinkedIn  Digg this  Stumble this page.  Add to Technorati favorites  Save this page on your Google Home Page 

For more information

Contact Us

info@sei.cmu.edu

412-268-5800

Help us improve

Visitor feedback helps us continually improve our site.

Please tell us what you
think with this short
(< 5 minute) survey.