2022 Research Review / DAY 3
Advancing Algorithms for File Deduplication Across Containers
Container software virtually packages and isolates applications for deployment. It can operate over multiple network resources so applications can run in isolated user spaces (containers) in any cloud (or non-cloud) environment. The Department of Defense (DoD) wants to use containers to support its vision of a cloud-to-edge continuum in which capabilities packaged as containers are pushed from the cloud to edge devices to support localized data processing. However, devices deployed at the tactical edge are resource limited and commonly operate over disconnected, intermittently connected, low-bandwidth (DIL) networks or hostile environments in which there is a high likelihood of bad actors trying to tamper with them.
The Department of Defense wants to use containers to support its vision of a cloud-to-edge continuum in which capabilities packaged as containers are pushed from the cloud to edge devices to support localized data processing.
Kevin PitstickSenior Software Engineer
To address these limitations, we developed an automated container image minimization technology. This technology combined and improved on two minimization approaches: pruning (removing unnecessary files from single images) and deduplication (combining shared files across images into common layers). We focused on advancing the state-of-the-art in deduplication across container images.
To create this new technology, we developed an algorithm for file deduplication across a collection of container images that can reduce container image storage usage and update bandwidth by up to 5–15% for multi-container deployments and by up to 10–30% for pruned container deployments. In our tests with real multi-container image systems, our algorithm deduplicates 100% of shared files and processes 10 images with 225,000 files in approximately 81 minutes.
This project focused on technology that supports the Open Container Initiative (OCI) standard because the DoD aims to avoid vendor lock-in and leverage OCI-compliant containers. Additionally, this project has the potential to accelerate the SEI’s impact by open sourcing minimization algorithms to gain wider interest and adoption from industry and the DoD community.
In Context
This FY2022 project
- aligns with the SEI technical objective to be trustworthy in construction and implementation and resilient in the face of operational uncertainties, including known and yet unseen adversary capabilities
- aligns with the SEI technical objective to be affordable such that the cost of acquisition and operations, despite increased capability, is reduced and predictable and provides a cost advantage over our adversaries
- aligns with the DoD software objective to enhance resilience
Principal Investigator
Kevin Pitstick
Senior Software Engineer
SEI Collaborators
Sebastián Echeverría
Senior Software Engineer
Brandon Born
Associate Software Engineer
Brent Clausner
DevOps Engineer
Carl Gruhn
Assistant Software Engineer
Gary Zhang
Software Developer Intern
Lihan Zhan
Assistant Software Engineer
Joseph Bell
Associate Software Engineer