Why do system-level failures still occur despite fault tolerance techniques being deployed in systems?
From a development perspective, the tight integration of a large number of components creates many potential failure modes caused by interactions that cannot be discovered by unit testing. In this project, our focus is on identifying system-wide design rules that must be satisfied in order to limit propagation of seemingly minor faults throughout the system.
Our objectives in this project are to
Our approach is to build architectural models using the Architecture Analysis and Design Language (AADL) to identify system fault behaviors that are not addressed by component-fault containment techniques, to develop a formalized analysis framework for system fault containment and stability management, and to validate system architectures in the context of this framework.
Our model-based analytic framework for this investigation includes
Read a report (pdf, 688 kb) or presentation (pdf, 883 kb) on fault propagation and error modeling.
Find Us Here
For more information