Software Engineering for Machine Learning: Characterizing and Detecting Mismatch and Predicting Inference Degradation in ML SystemsSEI Speaking Free Online Access
Jan 26, 2021 · Webcast
Learn perspectives involved in the development and operation of ML systems.
A problem with deployment of machine learning (ML) systems in production environments is that their development and operation involve three perspectives, with three different and often completely separate workflows and people: the data scientist builds the model; the software engineer integrates the model into a larger system; and then operations staff deploy, operate, and monitor the system. Because these perspectives operate separately and often speak different languages, there are opportunities for mismatch between the assumptions made by each perspective with respect to the elements of the ML-enabled system and the actual guarantees provided by each element.
We conducted a study with practitioners to identify mismatches and their consequences. In parallel we conducted a multi-vocal literature study to identify best practices for software engineering of ML systems that could address the identified mismatches. The result is a set of machine-readable descriptors that codify attributes of system elements and therefore make all assumptions explicit. The descriptors can be used by system stakeholders in a manual way, for information awareness and evaluation activities, and by automated mismatch detectors at design time and runtime, for cases in which attributes lend themselves to automation.
This study showed that many mismatch examples were due to lack of understanding of how to monitor ML systems to detect problems with the quality of inferences made by deployed models. In this talk, we also introduce a new project that will develop novel metrics that predict when a model’s inference quality will degrade below a threshold. The expected benefits of the metrics are that they will be able to determine (1) when a model really needs to be retrained so as to avoid spending resources on unnecessary retraining and (2) when a model needs to be retrained before its scheduled retraining time so as to minimize the time that the model is producing suboptimal results. The metrics will be validated in the context of models using convolutional neural networks (CNNs), which are state of the art and ubiquitous for computer vision and relevant to Department of Defense (DoD) systems such as surveillance, autonomous vehicles, landmine removal, manufacturing quality control, facial recognition, captured enemy material (CEM) analysis, and disaster response.
What attendees will learn:
- Perspectives involved in the development and operation of ML systems
- Types of mismatch that occur in the development of ML systems
- Future work in software engineering for ML systems