Fault Containment

Why do system-level failures still occur despite fault tolerance techniques being deployed in systems?

From a development perspective, the tight integration of a large number of components creates many potential failure modes caused by interactions that cannot be discovered by unit testing. In this project, our focus is on identifying system-wide design rules that must be satisfied in order to limit propagation of seemingly minor faults throughout the system.

Our objectives in this project are to

  • develop a system fault containment and stability management framework
  • identify categories of potentially unmanaged faults and their root causes
  • develop an analytical approach for fault propagation that can lead to system failures
  • develop effective system-level fault containment strategies
  • specify and validate architecture patterns conducive to robustness and stability in systems

Our approach is to build architectural models using the Architecture Analysis and Design Language (AADL) to identify system fault behaviors that are not addressed by component-fault containment techniques, to develop a formalized analysis framework for system fault containment and stability management, and to validate system architectures in the context of this framework.

Our model-based analytic framework for this investigation includes

  1. root cause analysis of system-level faults
  2. analytic exploration of unmanaged faults
  3. fault-impact analysis and system-level fault containment strategies

Read a report (pdf, 688 kb) or presentation (pdf, 883 kb) on fault propagation and error modeling.

Find Us Here

Find us on Youtube  Find us on LinkedIn  Find us on twitter  Find us on Facebook

Share This Page

Share on Facebook  Send to your Twitter page  Save to del.ico.us  Save to LinkedIn  Digg this  Stumble this page.  Add to Technorati favorites  Save this page on your Google Home Page 

For more information

Contact Us

info@sei.cmu.edu

412-268-5800

Help us improve

Visitor feedback helps us continually improve our site.

Please tell us what you
think with this short
(< 5 minute) survey.