Organizations rely on valid data to make informed decisions. When data integrity is compromised, the veracity of the decision-making process is likewise threatened. Detecting data anomalies and defects is an important step in understanding and improving data quality.
The study described in this report investigated statistical anomaly detection techniques for identifying potential errors associated with the accuracy of quantitative earned value management (EVM) data values reported by government contractors to the Department of Defense.
This research demonstrated the effectiveness of various statistical techniques for discovering quantitative data anomalies. The following tests were found to be effective when used for EVM variables that represent cumulative values: Grubbs' test, Rosner test, box plot, autoregressive integrated moving average (ARIMA), and the control chart for individuals. For variables related to contract values, the moving range control chart, moving range technique, ARIMA, and Tukey box plot were equally effective for identifying anomalies in the data.
One or more of these techniques could be used to evaluate data at the point of entry to prevent data errors from being embedded and then propagated in downstream analyses. A number of recommendations regarding future work in this area are proposed in this report.
This report is related to the following area(s) of work:
Measurement and AnalysisTechnical Report
CMU/SEI-2011-TR-027
December 2011
SEI:
Kasunic, Mark; McCurley, James; Goldenson, Dennis; & Zubrow, David. An Investigation of Techniques for Detecting Data Anomalies in Earned Value Management Data (CMU/SEI-2011-TR-027). Software Engineering Institute, Carnegie Mellon University, 2011. http://www.sei.cmu.edu/library/abstracts/reports/11tr027.cfm
IEEE:
M. Kasunic, J. McCurley, D. Goldenson, and D. Zubrow, "An Investigation of Techniques for Detecting Data Anomalies in Earned Value Management Data," Software Engineering Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, Technical Report CMU/SEI-2011-TR-027, 2011. http://www.sei.cmu.edu/library/abstracts/reports/11tr027.cfm
APA:
Kasunic, M., McCurley, J., Goldenson, D., & Zubrow, D. (2011). An Investigation of Techniques for Detecting Data Anomalies in Earned Value Management Data (CMU/SEI-2011-TR-027). Retrieved May 23, 2013, from the Software Engineering Institute, Carnegie Mellon University website: http://www.sei.cmu.edu/library/abstracts/reports/11tr027.cfm
CHI:
Kasunic, Mark, James McCurley, Dennis Goldenson, and David Zubrow. An Investigation of Techniques for Detecting Data Anomalies in Earned Value Management Data (CMU/SEI-2011-TR-027). Pittsburgh, PA: Software Engineering Institute, Carnegie Mellon University, 2011. http://www.sei.cmu.edu/library/abstracts/reports/11tr027.cfm
MLA:
Kasunic, M., McCurley, J., Goldenson, D., & Zubrow, D. 2011. An Investigation of Techniques for Detecting Data Anomalies in Earned Value Management Data (Technical Report CMU/SEI-2011-TR-027). Pittsburgh: Software Engineering Institute, Carnegie Mellon University. http://www.sei.cmu.edu/library/abstracts/reports/11tr027.cfm
For more information