Software Engineering Institute Carnegie Mellon

Software Architecture for Dependable and Evolvable Industrial Computing Systems, A

Lui Sha
Ragunathan Rajkumar
Michael Gagliardi

Technical Report
CMU/SEI-95-TR-005

PDF File
PostScript File

The downtime of a large industrial operation is often prohibitively expensive and a failure of a mission critical system could have disastrous consequences. Lacking an effective approach to mitigate the risks in system upgrades or to introduce third party supplied open system components, many industrial systems and defense systems are forced to keep outdated computing hardware and software.

A paradigm shift is needed, from a focus on enabling technologies for completely new installations to one which is designed to mitigate the risk and cost of bringing new technology into functioning systems. Innovative technology is needed to support the task of technology insertion. Quickly and reliably turning unparalleled American innovations into industrial competitiveness and defense technological superiority is of strategic importance.

The Simplex architecture has been developed to support safe and reliable online upgrade of hardware and software components in spite of errors in the new modules. This paper gives a brief overview of the underlying technologies.

Industrial and defense computing systems often have stringent safety, reliability and timing constraints. Failure in such systems can potentially have catastrophic consequences, and system downtimes can be expensive.

In this paper, we give a brief overview of the technology foundation of the Simplex Architecture. The architecture can be used to maintain the safety, reliability and real-time constraints of industrial and defense computing systems, despite inevitable glitches when new technologies are introduced and integrated with existing equipment. The architecture is based on open system components, and supports the safe evolution of the application software architecture itself online. It will also support the safe online addition and removal of computing hardware and system software.

Two demonstration prototypes were built and are available for demonstration. The single computer prototype uses a personal computer controls that an inverted pendulum. The controller software can be modified on the fly. Members of audience are invited to modified the control software online. Arbitrary bugs at the application level can be inserted by the audience. The demonstration shows that the control performance can only be improved but not degraded. A triplicated fault tolerant group implements the model based voting. It permits the safe online modification of not only the application software but the hardware and system software. An improved minority can gain and improved the control performance. However, no hardware or software errors at that computer can degrade the control, including malicious attempts with root privileges at that computer. The triplicated fault tolerant group can also be reconfigured online into a duplex system or a uni-processor system and vice versa as needed.

Currently, the SEI is actively working with industry partners and government agencies to mature this promising new technology.