NEWS AT SEI
This article was originally published in News at SEI on: March 1, 2000
In recent years, there have been dramatic changes in the character of security problems, their technical and business contexts, and the goals of system stakeholders. As a consequence, many of the assumptions underlying traditional security technologies are no longer valid. In particular, a new fundamental assumption is that any individual component of a system can be compromised by attacks, accidents, or failures. To ensure that mission-critical functions are sustained and essential services are delivered despite the presence of attacks, accidents, or failures, a survivability perspective on security practices is needed.
Survivability from a Business Perspective
Many businesses have contingency plans for how to handle business interruptions caused by natural disasters or accidents. Although the majority of cyber-attacks are relatively minor in nature, a cyber-attack on an organization’s critical networked information systems has the potential to cause severe and prolonged business disruption, whether the business has been targeted specifically or is a random victim of a broad-based attack. If a cyber-attack disrupts critical business functions and interrupts the essential services that customers depend on, the survival of the business itself is at risk. Moreover, a business disruption caused by a cyber-attack will likely be seen by customers as a sign of incompetence. Unless the cyber-attack is widespread and well publicized, no customer sympathy will be forthcoming.
Survivability is an emerging discipline [ISW 97], [ISW 98] that blends computer security with business risk management for the purpose of protecting highly distributed information services and assets. A fundamental assumption is that no system is totally immune to attacks, accidents, or failures. Therefore, the focus of this new discipline is not only to thwart computer intruders, but also to ensure that mission-critical functions are sustained and essential situation-dependent services are delivered, despite the presence of cyber-attacks. Improving survivability in the presence of cyber-attacks also improves the organization’s capacity to survive accidents and system failures that are not malicious in nature.
Traditional computer security is a highly specialized discipline that seeks to thwart intruders through technical means that are largely independent of the domain of the application or system being protected. Firewalls, cryptography, access control, authentication, and other mechanisms used in computer security are meant to protect an underlying application in much the same way regardless of the specific application being protected. In contrast, survivability has a very sharp mission focus. Ultimately it is the mission that must survive, not any particular component of the system or even the system itself. (The mission must go on even if an attack causes significant damage to—or even destroys—the system that supports the mission.) This focus on mission survivability rather than on system protection is the most radical paradigm shift that is occurring as the new discipline of information survivability emerges.
Survivability solutions are best understood as risk-management strategies that first depend on an intimate knowledge of the mission being protected. Risk-mitigation strategies first and foremost must be created in the context of a mission’s requirements (prioritized sets of normal and stress requirements), and they must be based on "what-if" analyses of survival scenarios. Only then can we look toward generic software engineering solutions based on computer security, other software quality-attribute analyses, or other strictly technical approaches to support the risk-mitigation strategies.
Consider the analogy of a village farmer with the mission of supplying food to a village. The farmer may have a fence around the crops to keep out deer, rabbits, and other intruders (traditional security). The farmer may have an irrigation system to be used in the event of insufficient rainfall (redundancy). He or she may plant a variety of crops so that even if environmental conditions such as pests adversely affect one crop, others will survive (diversity). All of this is well and good. But even if all the crops fail and no food is grown, the mission can still succeed if the farmer has an alternate strategy based on the mission of providing food—not necessarily growing food using the local ecosystem. If the crops fail, the farmer may turn to hunting or fishing to provide the life-sustaining mission fulfillment that fellow villagers depend on. Is hunting a security, reliability, or fault-tolerance strategy? No, it is outside the system for growing food. This is a risk-management strategy that can only be formulated with an intimate understanding of the mission that must survive. Detailed technical expertise on fence-building, or even agriculture, is helpful but inadequate compared to strategies based on an intimate knowledge of the mission requirements.
Survivability depends not only on the selective use of traditional computer-security solutions, but also on the development of effective risk-mitigation strategies based on scenario-driven "what-if" analyses and contingency planning. "Survival scenarios" positing a wide range of cyber-attacks, accidents, and failures aid in the analyses and contingency planning. However, to reduce the combinatorics inherent in creating representative sets of survival scenarios, these scenarios focus on adverse effects rather than causes. Effects are also of more immediate situational importance than causes because you will likely have to deal with (and survive!) an adverse effect long before determining whether the cause was an attack, an accident, or a failure.
Contingency (including disaster) planning requires risk-management decisions and economic tradeoffs that only executive management can make (preferably with guidance from technical experts). Survivability depends at least as much on the risk-management skills of an organization’s management as it does on the technical expertise of a cadre of computer security experts. (Here we are not referring to an abstract technical skill in the science of risk management, but rather to the ability to manage risk in the context of the specific business mission and goals.) Business risk management is arguably the primary function of executive management. The role of the experts in security, the application domain, and other technically relevant areas is to provide executive management with the information necessary to make informed risk-management decisions. Thus, the preparatory steps necessary for survivability must be taken by an organization as a whole, rather than by security experts alone.
Let’s consider the Galaxy-4 satellite that spun out of control on May 19, 1998, interrupting up to 90% of the pager service in the United States, along with some television network feeds and some credit card verification services. Even though a cyber-attack was not to blame (though "an international hacker attack" was on an early list of speculative causes), the example is quite illustrative. In fact, the cause (or at least a partial cause—crystals forming on tin-plated relay contacts and an unexplained failure of a backup system) was not determined until long after service was restored [Reuters 98].
Dealing with adverse events such as this one, without waiting for a definitive determination of the cause, is central to the survivability paradigm. Successful handling of such events depends far more on prudent risk management and contingency planning by executive management than on any specific technical approach by security or other experts. For instance, a "perfect" technical solution (i.e., having a diversely redundant, immediately available backup satellite) would be economically unfeasible. The practicality of many technical solutions can only be evaluated in the full business context. Executive management, through its contingency planning, would consider business solutions that might transcend purely technical solutions. One approach would be to have an agreement in place with another communications company to provide the needed capacity on, say, six hours’ notice (with the backup company dumping its own lower-priority customers) in exchange for an annual fee.
An alternate approach—using lawyers rather than technologists—would be to have a disclaimer in the contract agreement with customers telling them that the customer must bear the risk of service outages. This would put the customers on notice that they need to prepare to provide their own redundancy, whereas in the previous approach the service provider took care of redundancy through an agreement with an alternate provider. Because it raises customer awareness of some of the risks inherent in the delivery of service and possibly increases the perceived value of uninterrupted service, the "legal disclaimer" approach might even generate some customer interest in asking the original service provider to provide redundancy for an additional fee. The "legal disclaimer" approach is not one that technical experts would likely come up with, but it is quite effective in assuring the survivability of the business mission and goals. As this example illustrates, the risk-management viewpoint supports an "economics of survivability" that allows businesses to successfully prepare for and overcome the adverse effects of cyber-attacks, accidents, and failures with approaches that can transcend those offered by technical experts alone.
Contrast this new perspective with current practice. Upper management’s primary decision-making role, from a traditional security viewpoint, is predominantly to determine how much direct funding and other resources to grant to the organization’s security experts for the rather loosely defined purpose of "beefing up security" to some vaguely articulated industry-standard level of practice. In the minds of management, the perceived link between security funding and the business mission (and the business bottom line) is tenuous at best. "If I spend more money on computer security, my risk of intrusion will likely go down. But will this reduce any significant risks to my business mission? What risks will be reduced, and by how much?" With no clear benefit visible to management, the resulting security funding is typically inadequate to meet even the limited technical goals of the security experts. For the most part, what is sorely missing is an in-depth analysis of threats to the organization’s mission and a corresponding cost–benefit analysis for risk-mitigation strategies and contingency planning. The computer-security experts, isolated from management’s intimate understanding of the business mission, are in no position to perform the necessary threat analyses, except from the narrow perspective of their technical specialties.
As an example, consider the new government programs that are meant to assure that our nation’s critical infrastructures will continue to operate despite cyber-attacks, accidents, or failures. Government concern for critical infrastructure assurance [PCCIP 97] is helping to fuel the current interest in survivability, but this interest is not being driven by the businesses (such as those in energy, transportation, banking, and telecommunications) that would benefit from such protection. The government is asking industry to participate in critical-infrastructure assurance programs, with the motivation that these programs are in the best interests of the nation, the industries, and their customers. But none of these communities are willing to pay for the increased costs. Real investment in critical-infrastructure protection will occur only when executives understand that these changes are essential to their competitiveness and profitability. Unfortunately many of the businesses involved see these programs as mandating technical solutions that would be at odds with their customers’ needs and their own profitability. Greater awareness is needed of the business risk-management aspects of survivability, so that the organizations that operate our nation’s critical infrastructures would be motivated by self-interest to assure their own survivability. Critical-infrastructure assurance can then be based on risk-management tradeoffs that depend on overall business missions and goals, not solely on technical fixes that are independent of those goals.
There has been a revolutionary technical shift in business applications from stand-alone, closed systems over which organizations exercised complete control, to highly distributed, open, COTS-based systems over which only very limited control and limited insight are possible. Not only are most Internet services outside of the control of the businesses that use them, but so is the functionality and software quality attributes of the COTS-based software used to build business applications. This technical shift has taken us so far that we can no longer solve security problems entirely in the technical domain.
From the traditional computer-security perspective, executive management has never been sufficiently engaged. The security experts simply present a bill or funding request to management for generic technical solutions, independent of threat analyses that are specific to the mission being protected.
Upper management must be concerned with threats to the business mission and must be intimately involved in formulating mission-specific, risk-mitigation strategies. Moreover, technical experts need to be aware of the business issues that lead to the technical issues that they face. Only then can they contribute effectively to the risk-management approaches that are needed to assure survivability of highly distributed mission-critical applications, operating in unbounded domains, in the face of cyber-attacks, accidents, and system failures.
For Additional Information
This column is based on the following paper, which contains additional information about this topic.
H. F. Lipson and D. A. Fisher, "Survivability--A New Technical and Business Perspective on Security," Proceedings of the 1999 New Security Paradigms Workshop, September 21-24, 1999, Association for Computing Machinery, 1999.
[ISW 97] Proceedings of the 1997 Information Survivability Workshop, San Diego, California, February 12-13, 1997, Software Engineering Institute and IEEE Computer Society, April 1997.
[ISW 98] Proceedings of the 1998 Information Survivability Workshop, Orlando, Florida, October 28-30, 1998, Software Engineering Institute and IEEE Computer Society, 1998.
[PCCIP 97] Presidential Commission on Critical Infrastructure Protection, Critical Foundations — Protecting America’s Infrastructures, The Report of the Presidential Commission on Critical Infrastructure Protection, October 1997.
[Reuters 98] Reuters, "Pager Glitch Cause Lost in Space," Wired News, Wired Digital Inc., August 11, 1998.
About the Authors
Howard F. Lipson has been a computer security researcher at the SEI’s CERT Coordination Center for nearly eight years. He has played a major role in extending security research at the SEI into the new realm of survivability. His research interests include the design and analysis of survivable systems, survivable systems simulation, and critical infrastructure protection. Lipson has been a chair of two IEEE-sponsored workshops on survivability. Earlier, he was a computer scientist at AT&T Bell Labs, where he did exploratory development work on programming environments, executive information systems, and integrated network management tools. He holds a PhD in computer science from Columbia University.
David A. Fisher is currently leading a research effort in new approaches for survivability and simulation in information-based infrastructures at the SEI's CERT Coordination Center. From 1973-75, Fisher served as program manager in the Advanced Technology Program (ATP) at the National Institute of Science and Technology (NIST), where he developed and managed a major initiative in component-based software and began an initiative in learning technology. Fisher has more than 60 publications in the areas of information survivability, algorithms, component-based software, programming languages, compiler construction, and entrepreneurial development in the software industry. He earned a PhD in computer science at Carnegie Mellon University, an MSE from Moore School of Electrical Engineering at the University of Pennsylvania, and a BS in mathematics from Carnegie Institute of Technology.