General Navigation Buttons - Home | Search | Contact Us | Site Map | Whats New
products graphic
white space
products
Software Technology Roadmap
What's New
Background & Overview
Technology Descriptions
Defining Software Technology
Technology Categories
Template for Technology Descriptions
Taxonomies
Glossary & Indexes
Feedback & Participation
Software Engineering Information Repository (SEIR)
white space
About SEI|Mgt|Eng|Acq|Collaboration|Prod.& Services|Pubs
pixel
Rollover Popup Hints for Topic Navigation Buttons above
pixel
Simplex Architecture


Status

complete

Purpose and Origin

Real-time applications that play a mission-critical role are prevalent throughout the DoD and industry. The complexity of these systems make them expensive to design, maintain, and support. Their mission critical nature requires assurance of operational availability. These systems are often safety-critical, requiring a high degree of reliability. The long life cycles of these systems usually result in multiple capability upgrades as well as platform migrations. As the use of COTS products increases, upgrade cycles will become shorter.

Simplex architecture is a paradigm and an engineering framework that permits the quick, easy, and reliable insertion of new capabilities and technologies into mission critical real-time systems [Sha 96]. Simplex is the synthesis of selected best practices in several technology areas that support the safe, online upgrade of hardware and software, in spite of residual errors in the new components. Through the use of Simplex, it becomes possible to shift resources from static design and extensive testing to reliable incremental evolution.

Technical Detail

Software is pervasive within the critical systems that form the infrastructure of modern society, both military and civilian. These systems are often large and complex and require periodic and extensive upgrading. The important technical problems include the following:

  • Integration of new and revised components. The need for periodic and extensive upgrading and technology refreshment of systems challenges developers to integrate new or changed components into systems without compromising the strict reliability and availability requirements of the applications. There are significant strategic and tactical advantages afforded by the ability to adapt quickly to changing situations. These potential advantages challenge developers to find ways of modifying, upgrading, or adding system components more quickly while reducing the possibility of error.
  • Vendor driven upgrade. To cut costs and gain leverage from technical advances in the commercial sector, the DoD has encouraged more frequent use of COTS components in its software. For similar reasons, industry is often following suit. COTS components have a short life cycle (roughly one year.) DoD platforms change at a much slower rate and typically have longer life cycles (often 25-30 years or more). This make the DoD platform susceptible to a problem that occurs when the vendor releases a new version of the COTS component. The upgrade can either be ignored or incorporated into the system. Ignoring it will eventually result in a system that is burdened with unsupported and obsolete components. Incorporating it forces the DoD platform to change on a schedule determined by the vendor, rather than the system developer, maintainer, or customer. New releases usually add features and fix existing bugs, but in the process they also often introduce new bugs. So upgrading is risky; a way to manage the risk is needed.
  • Upgrade paradox. The upgrade paradox results from the use of replication or functional redundancy and majority voting. A minority upgrade will have no effect because it will be voted out of the system by the majority. A majority upgrade with residual errors can cause the system to fail.
Collectively, these technical problems present a formidable challenge to the developers and maintainers of systems with long life cycles.

Simplex is a framework for system integration and evolution. It integrates a number of technologies, including:

  • Analytic Redundancy. These technologies are used for integrated availability and reliability management. They employ sophisticated monitoring and switching logic which includes a simple leadership protocol. Analytic redundancy allows high-performance, but possibly less-reliable, components to be used in systems demanding a high degree of reliability. This is accomplished without sacrificing the performance and reliability levels provided by existing highly reliable components.
  • Replaceable Units. These technologies (dynamic binding) allow the replacement of software modules at runtime without having to shut down and restart the system.
  • Publish/Subscribe. These are flexible real-time group communication technologies that allow components to dynamically publish and subscribe to needed information [Rajkumar 95].
  • Rate Monotonic Scheduling. These technologies for real-time computing (see Rate Monotonic Analysis) allow components to be replaced or modified in real time, transparently to the applications, while still meeting deadlines. These technologies are integrated into the real-time operating system.
The above technologies are shown in the context of the overall structure of a Simplex-based application in Figure 33.

Figure 33: Simplex Technologies and Architecture

Figure 34 is a highly simplified view of the data flow in a system using Simplex. Notice that multiple versions of a component are employed-a Highly Reliable Component (HRC) and a High Performance Component (HPC). The HRC might be legacy software designed to control the device. It has known performance characteristics and presumably, due to long use, is relatively bug free. If we suppose that the HPC is a new version of the software with improved performance characteristics, but possibly also containing bugs since it has not yet been used extensively, the following scenario takes place.

Figure 34: Simplex: Simplified Data Flow

The device under control is sampled at a regular interval. The data is processed by both HRC and HPC. Instead of controlling the device directly, a simple leadership protocol is used. Under this protocol, both modules send their results to the Monitoring and Switching Logic (MSL), which also uses inputs obtained from the device under control to decide which output to pass back to the device. As long as HPC is behaving properly, it is the leader and its output will be transmitted to the device. Should MSL decide that HPC is not behaving correctly, it makes the HRC the leader and uses its output instead. Thus the device will perform no worse than it did before the upgrade to HPC occurred. This solves the upgrade paradox even in the presence of multiple alternatives because at any instant only the output of one of the alternatives is used. Not shown, for reasons of complexity, is the module that would actually remove a failed HPC from the system and allow it to be replaced with a corrected version for another try.

Usage Considerations

Simplex is most suitable for systems that have high availability and reliability requirements. It seems especially suitable for systems such as control systems (real-time or process) whose behavior can be modeled and monitored.

Because Simplex is relatively immature, pilot studies will be needed to determine its suitability for any intended application. This would involve developing a rapid prototype, using Simplex, of a simplified instance of the intended application.

Maturity

The safe, online upgrade of both software and hardware, including COTS components, using Simplex has been successfully demonstrated in the laboratory. Simplex is being transitioned into practice via several pilot studies:

  • Silicon Wafer Manufacturing. The objective was to demonstrate the use of Simplex as the basis for the control architecture in manufacturing process-control software. This was a joint effort between the Software Engineering Institute and the Department of Electrical and Computer Engineering at Carnegie Mellon, guided by engineers from SEMATECH.
  • NSSN (new attack submarine program). This study involved a US Navy program whose goal is the development, demonstration, and transition of a COTS-based fault-tolerant submarine control system that can be upgraded inexpensively and dependably.
  • INSERT (INcremental Software Evolution for Real-Time Systems). This project was funded by the Air Force/DARPA EDCS (evolutionary design of complex software) program, whose goal is to evaluate the possible use of Simplex in the context of onboard avionics systems. Work is proceeding with Lockheed-Martin Tactical Aircraft Systems to investigate the application of this technology to the automated maneuvering capability of the F-16 fighter.

Costs and Limitations

Simplex is designed to support the evolution of mission-critical systems that have high availability or reliability requirements. Its suitability for management information systems (e.g., MIS) applications that do not have such requirements has yet to be determined. Its usefulness in C4I systems is currently being investigated.

Although Simplex has been designed to reduce the life-cycle cost of systems, data on its impact on system life-cycle cost is not available at this time. Much of Simplex is built upon COTS components such as a POSIX compliant real-time operating system running on modern hardware. This tends to reduce costs relative to custom designs.

When using Simplex, engineering costs are increased by the need to analyze and create the analytically redundant modules. Additionally, there is some overhead involved in the operation of the monitoring and switching logic. Finally, the need to run multiple copies of an application (i.e., the HRC and HPC simultaneously) requires additional resources-at the very least additional memory and CPU cycles. These factors tend to have an upward effect on costs-compensated for by the increased reliability and flexibility which Simplex provides.

A perhaps more important consideration is the savings that Simplex provides by reducing the required testing and downtime when installing an upgraded component. The expectation is that the use of Simplex will provide a significant savings in total life-cycle cost.

Complementary Technologies

Software and hardware reliability modeling and analysis allow users to estimate the impact of Simplex on system reliability. System life-cycle cost estimation techniques will allow users to estimate the cost impact.

Index Categories

This technology is classified under the following categories. Select a category for a list of related topics.

Name of Technology

Simplex Architecture

Application category

Reapply Software Life Cycle (AP.1.9.3)
Reengineering (AP.1.9.5)
Software Architecture (AP.2.1)
Restart/Recovery (AP.2.10)

Quality measures category

Availability/Robustness (QM.2.1.1)
Reliability (QM.2.1.2)
Safety (QM.2.1.3)
Real-time Responsiveness/Latency (QM.2.2.2)
Maintainability (QM.3.1)

Computing reviews category

Fault-tolerance (D.4.5)
Real-time and embedded systems (D.4.7)
Network communication (D.4.4)

References and Information Sources

[Altman 97] Altman, Neal. The Simplex Architecture [online]. Available WWW
<URL: http://www.sei.cmu.edu/simplex/simplex_architecture.html> (May 6, 1997).
[Sha 96] Sha, L.; Rajkumar, R.; & Gagliardi, M. "Evolving Dependable Real Time Systems," 335-346. Proceedings of the 1996 IEEE Aerospace Applications Conference. Aspen, CO, February 3-10, 1996. New York, NY: IEEE Computer Society Press, 1996.
[Rajkumar 95] Rajkumar, R.; Gagliardi, M.; & Sha, L. "The Real-Time Publisher/Subscriber Inter-Process Communication Model for Distributed Real-Time Systems: Design and Implementation," 66-75. The First IEEE Real-Time Technology and Applications Symposium. Chicago, IL, May 15-17, 1995. Los Alamitos, CA: IEEE Computer Society Press, 1995.

Current Author/Maintainer

Charles B. Weinstock, SEI
Lui R. Sha, SEI

External Reviewers

John Lehoczky, Professor, Statistics Department, CMU

Modifications

29 Oct 97 changes include:

· Updated list of pilot studies.

· Provided additional detail on constituent technologies.

· Added application architecture diagram.

· Improved data flow diagram and enhanced the explanation.

· Added additional information on anticipated costs (where these are generally understood.)

10 Jan 97 (original)



The Software Engineering Institute (SEI) is a federally funded research and development center sponsored by the U.S. Department of Defense and operated by Carnegie Mellon University.

Copyright 2007 by Carnegie Mellon University
Terms of Use
URL: http://www.sei.cmu.edu/str/descriptions/simplex_body.html
Last Modified: 11 January 2007