search menu icon-carat-right cmu-wordmark
2021 Year in Review

Getting the Jump on System Failures in AI-Powered Data Processing Pipelines

Up-to-date intelligence is essential to mission success, and data is essential to accurate and actionable intelligence. Data processing pipelines developed by the Department of Defense (DoD) employ artificial intelligence (AI) and other software capabilities to allow analysts to focus on more important analytical tasks.

Analysts and military personnel supporting critical missions must be able to understand the state of their data processing pipelines and take action when problems occur.

Grace Lewis
Principal Researcher & Tactical and AI-Enabled Systems (TAS) Initiative Lead, SEI Software Solutions Division
Photo of Grace Lewis.

These pipelines are complex and can suffer multiple problems. The ability of AI components to make inferences from data may degrade over time, software components might crash, hardware components might be compromised, or pipelines with overtaxed resources may suffer poor throughput. Such issues could impede analysts’ ability to support assigned missions or, worse, give them inaccurate information for crucial decision making.

“Analysts and military personnel supporting critical missions must be able to understand the state of their data processing pipelines and take action when problems occur,” said Grace Lewis, an SEI principal researcher. “System failures can be easy to detect. Detecting unreliable results of AI components is difficult because the system keeps producing results, but they’re inaccurate.”

To address this challenge, Lewis and her SEI colleagues are at work on AI End-to-End (AIE), a system for the development, deployment, and monitoring of data processing pipelines that may contain AI components. The goal of AIE, built on earlier SEI work on the Cornerstone resilient situational awareness system for the Office of the Under Secretary of Defense for Research and Engineering (OUSD(R&E)), is to monitor running pipelines and automatically reconstitute them when system or component failures are detected.

Monitoring becomes more complicated when data processing pipelines, especially those with AI components, are distributed across multiple specialized nodes. “Field sensor data may be processed on a specialized edge device, with results pushed to the cloud for further processing and storage,” said Lewis. “All these components might have been developed by different organizations. It’s difficult to monitor different types of system elements, such as platforms, networks, software components, and, especially, AI components.”

AIE data processing pipelines comprise services or containers with monitoring endpoints that expose component-specific metrics, including special metrics for AI services. During operation, AIE continuously polls these metrics and compares them against thresholds for component failure. If a component fails, AIE replaces the pipeline with an equivalent that can continue to meet mission needs.

Another challenge is continued pipeline operation on infrastructures that involve embedded, sometimes legacy, components, either commercial or developed by the DoD. These infrastructures demand new ways of measurement, modeling, and distributed management that are automated and adapt to a dynamic environment from the application to the physical layer.

The SEI’s work on AIE has been integrated into a multi-organization demonstration of automatic reconstitution of an AI-enabled data processing pipeline. The next step is to support deployment and integration of services deployed on edge platforms. Lewis’s team intends for AIE to automatically handle failures caused by the challenges of operating in tactical and edge environments, such as limited computing resources and network connectivity. This kind of technology supports robust and secure AI, one of the three pillars of the emerging discipline of AI engineering being led by the SEI.

AIE will enable large-scale automation of capabilities distributed between the cloud and embedded software infrastructures. The work will also allow more resilient, cost effective, and timely deployment of heterogeneous cloud infrastructure and provide a rich environment for fundamental research in system representation and analysis.

“Collaborative development across commercial, government, and DoD partners is critical for the software development and operations approach that allows assessment of the design, deployment, and maintenance phases of large-scale integrated systems,” said Robert Bonneau, OUSD(R&E) director of software embedded systems and data analytics. “This approach enables rapid system reconfigurability as well as cost reduction.”