Leveraging Adversarial Machine Learning Techniques to Perform Query-Access Fairness Evaluations

With close to three million employees, the U.S. Department of Defense (DoD) is the largest employer in the world. Because of the scale of its required personnel management decisions, the DoD would greatly benefit from automation and artificial intelligence (AI). However, while many commercial solutions use AI to automate hiring and management decisions, recent studies have uncovered severe biases against protected groups built into these systems, likely removing qualified candidates from consideration.

To protect their intellectual property, these services and tools are unlikely to expose their models to users for inspection. Consequently, DoD hiring teams lack the means to ensure these solutions comply with legal requirements for fairness in hiring practices (i.e., Title VII of the Civil Rights Act of 1964) and the DoD is unable to take advantage of these existing commercial solutions. This project aims to bridge that gap to develop methods to quantify bias in AI systems without requiring access to the underlying model (API-only access).

Our goal is to improve the state of the art in the measurement of the fairness of machine learning models.

Anusha Sinha

Machine Learning Research Scientist

Our goal is to improve the state of the art in the measurement of the fairness of machine learning (ML) models. As part of this project, we will

capitalize on our current ability to detect bias in the context of vulnerability to adversarial attacks to detect the presence of bias in the context of unfairness
select existing CMU SEI implementations of adversarial attacks for each resilience-fairness formalism and incorporate them into an open
validate our implemented fairness evaluation methods and interpret bounds on unfair behavior using publicly available datasets and models
develop general principles for fairness evaluations in settings where only query API access to models being evaluated is available

We aim to create the first method to measure fairness on the disparate impact criterion without leaking sensitive information about model architectures or training data to a test and evaluation team. Our project will utilize symmetries between adversarial threat models previously studied at the CMU SEI and definitions of fairness to reframe adversarial attacks as methods to quantify potential unfairness in machine learning models. These quantification techniques will provide practical means to allow the DoD to automate parts of the recruitment and hiring process, while still ensuring fairness for protected groups in compliance with the law.

2023_Leveraging Adversarial Machine Learning Techniques to Perform Query-Access Fairness Evaluations

In Context: This FY2023-24 Project

is a collaborative effort with researchers from Carnegie Mellon University
builds on previous CMU SEI research on adversarial threat models
aligns with the CMU SEI technical objective to be trustworthy in construction and implementation and resilient in the face of operational uncertainties, including known and yet-unseen adversary capabilities
aligns with the OUSD(R&E) critical technology priority of developing trusted AI and autonomy

Research Review 2023