Efficacy of Human Teaming with Generative AI for Software Maintenance

Updating a software system’s frameworks and libraries keeps software functional and secure. But the labor-intensive, error-prone process often causes software maintenance to lag behind the operational tempo of defense programs, as the SEI found in 2023. Generative artificial intelligence (AI) tools can speed up code generation, and an increasing number of developers use AI tools daily. But the true productivity of this human-AI teaming remains an open question. An SEI study investigated whether early-stage developers over rely on AI tools, especially when conducting complex code generation tasks such as library replacement.

In the study, led by Ipek Ozkaya, technical director of the SEI’s AI-Native Software Engineering directorate, software engineering graduate students upgraded two applications with changes introduced by a library update. They did it manually first and then with a large language model (LLM) tool to generate an automated refactoring solution. The study investigated the effectiveness of the manual and automated approaches, whether students transferred foundational software engineering problem-solving skills to their use of the LLM, and whether students gave appropriate direction and oversight to the LLM.

The results showed that when students used the LLM, they often struggled to produce correct, reproducible solutions. They also over trusted it. Instead of applying their own software engineering skills or steering the LLM to its strengths in exploring problems and solutions, students got stuck using the LLM to fix errors.

In a time when 84 percent of developers use or plan to use AI tools, according to a Stack Overflow survey, the SEI study adds to the empirical evidence recommending a more disciplined approach to incorporating AI into software development. “While the tools are great, the developers may not be using them as intended, so you may not be able to see the benefits,” said Ozkaya. She believes that as more developers of different experience levels use generative AI, especially in government, how they are upskilled must evolve.

For industry or the Pentagon to effectively scale automation in maintaining complex, long-lived systems, engineers must be equipped with skills for rigorous oversight of AI tools and mastery of generative AI workflows, in addition to foundational software engineering expertise.

Ipek Ozkaya

Technical Director, AI-Native Software Engineering, SEI Software Solutions Division

“We are entering an era where new software engineers will routinely use generative AI—sometimes deliberately, sometimes implicitly,” Ozkaya said. “For industry or the Pentagon to effectively scale automation in maintaining complex, long-lived systems, engineers must be equipped with skills for rigorous oversight of AI tools and mastery of generative AI workflows, in addition to foundational software engineering expertise.”