Predicting Changing Conditions in Production Machine Learning Systems
Created May 2022
The inference quality of deployed machine learning (ML) models degrades over time due to differences between training and production data, typically referred to as drift. While large organizations periodically train MS systems to evade drift, the reality is that not all organizations have the data and the resources required to do so. The SEI developed a process for drift behavior analysis at model development time that determines the set of metrics and thresholds to monitor for runtime drift detection. A better understanding of how models will react to drift before they are deployed, combined with a mechanism for how to detect drift in production, is an important aspect of continued operation of ML systems.
The Need to Find and Fix Inference Degradation
After ML systems are deployed, their models need to be retrained to account for "drift," or differences between training and production data. These differences over time lead to inference degradation—negative changes in the quality of ML inferences—which eventually reduce the trustworthiness of systems.
For example, an ML system designed to protect a network against intrusion attempts might use a model that has been trained to recognize certain methods of intrusion attempts when it is first deployed. However, if attackers begin using new methods that are different from those that the ML system is trained to recognize, the system would fail to detect an intrusion. To keep up with the changing environment, model developers must retrain the model to recognize those methods.
Ideally, model developers and system administrators would be able to identify model inference degradation quickly and accurately so they can take timely and appropriate action (e.g., retraining, cautioning users, or taking a capability offline) to detect drift and inference degradation so that systems can perform optimally. But first they need to know whether they need to retrain the systems, and when they need to do so.
A Mechanism to Predict and Prevent Drift
The SEI developed a mechanism for analyzing drift behavior and monitoring production ML systems for inference degradation. This mechanism involves
- introducing realistic drift into datasets
- leveraging a sample set of empirically validated metrics to predict when data drift will degrade a model’s inference quality
- conducting experiments using an extensible tool set to support contextual drift behavior analysis as part of model development
- determining metrics and thresholds that need to be monitored in production that would indicate drift
- providing reusable modules and libraries that can be embedded into model monitoring infrastructures to support realistic drift detection in production ML systems.
By implementing this mechanism, model developers and systems administrators can stay ahead of drift and inference degradation before it compromises model accuracy and system integrity. In the Department of Defense and other government systems, recognizing and addressing inference degradation can avoid costly reengineering, system decommissioning, and misinformed decisions.
The SEI continues its work on the analysis and detection of inference degradation as an aspect of model production readiness. Our future work will focus on
- developing additional drift induction functions and drift detection metrics for developers to use during model development and evaluation
- developing and integrating a drift analysis component into the tool set that codifies the analysis that a model developer would conduct using the provided information (currently requires manual analysis)
- generating a code library for drift detection at runtime that can be integrated into monitoring infrastructures
We are seeking collaborators to share their experiences using metrics, models, and other approaches to drift analysis and detection in their ML systems. If you want to work with us, reach out to us today.
Augur: A Step Towards Realistic Drift Detection in Production ML Systems
April 11, 2022 White Paper
Sebastián EcheverríaLena PonsJeff Chrabaszcz (Govini)
The toolset and experiments reported in this paper provide an initial demonstration of (1) drift behavior analysis (2) metrics and thresholds (3) libraries for drift detection.read