Applying Causal Learning to Evaluate Large Language Models (LLMs)
• SEI Report
Publisher
Software Engineering Institute
DOI (Digital Object Identifier)
10.1184/R1/30251989Topic or Tag
Abstract
As the SEI’s body of causal work has evolved into an end-to-end causal discovery and inference method and tool suitable for detecting bias in ML and AI models, SEI researchers are beginning to investigate whether the first step of the method, causal discovery, can also be applied to LLMs. The SEI’s approach to exploring this question comprises three steps: (1) obtain a dataset of story/summary pairs to use as ground-truth, (2) design prompt styles (e.g., purpose, tone) with which to prompt a Summarizer LLM to summarize a story from one of those pairs, and (3) design a set of summarization-quality features employed by an Evaluator LLM to score the quality of summaries generated by the Summarizer LLM. In this way, SEI researchers created a dataset of higher level features for input to causal discovery. The resulting causal graph demonstrates that a causal relationship between the focus of a prompt style and summary quality is often discoverable when both features overlap. This overall approach may benefit software engineering and LLM research by providing a more formal methodology for assessing the nuanced cause-and-effect relationships unique to a given LLM, reducing confounding.
Cite This SEI Report
Konrad, M., Mellinger, A., Gates, L., Shepard, D., & Testa, N. (2026, March 2). Applying Causal Learning to Evaluate Large Language Models (LLMs). Retrieved March 7, 2026, from https://doi.org/10.1184/R1/30251989.
@techreport{konrad_2026,
author={Konrad, Michael and Mellinger, Andrew and Gates, Linda Parker and Shepard, David and Testa, Nicholas},
title={Applying Causal Learning to Evaluate Large Language Models (LLMs)},
month={{Mar},
year={{2026},
howpublished={Carnegie Mellon University, Software Engineering Institute's Digital Library},
url={https://doi.org/10.1184/R1/30251989},
note={Accessed: 2026-Mar-7}
}
Konrad, Michael, Andrew Mellinger, Linda Parker Gates, David Shepard, and Nicholas Testa. "Applying Causal Learning to Evaluate Large Language Models (LLMs)." Carnegie Mellon University, Software Engineering Institute's Digital Library. Software Engineering Institute, March 2, 2026. https://doi.org/10.1184/R1/30251989.
M. Konrad, A. Mellinger, L. Gates, D. Shepard, and N. Testa, "Applying Causal Learning to Evaluate Large Language Models (LLMs)," Carnegie Mellon University, Software Engineering Institute's Digital Library. Software Engineering Institute, 2-Mar-2026 [Online]. Available: https://doi.org/10.1184/R1/30251989. [Accessed: 7-Mar-2026].
Konrad, Michael, Andrew Mellinger, Linda Parker Gates, David Shepard, and Nicholas Testa. "Applying Causal Learning to Evaluate Large Language Models (LLMs)." Carnegie Mellon University, Software Engineering Institute's Digital Library, Software Engineering Institute, 2 Mar. 2026. https://doi.org/10.1184/R1/30251989. Accessed 7 Mar. 2026.
Konrad, Michael; Mellinger, Andrew; Gates, Linda Parker; Shepard, David; & Testa, Nicholas. Applying Causal Learning to Evaluate Large Language Models (LLMs). Software Engineering Institute. 2026. https://doi.org/10.1184/R1/30251989