Originally published by the High Confidence Software and Systems Conference
High Confidence Software and Systems Conference, Annapolis, MD, May 6-8, 2024
Authors:
Adam Seitz, Jonathan Dorn, Zak Fry
Abstract:
Containerization is an increasingly popular method for enhancing portability and reliability in commercial and governmental deployments. Containerized environments are often built using baseline images (e.g., common Linux distributions) with required dependencies installed for the target software packages (e.g., Python tooling, database infrastructure, web frontends). Depending on the capabilities required, the resulting container images often contain bloat – software, libraries, and other files that do not support the stated mission for a given deployment. Bloat can result from the need to provide widely applicable baseline containers for varied domains coupled with the high effort associated with manually optimizing a container for a given deployment. This bloat represents unnecessary attack surfaces (e.g., for living-off-the-land attacks), spurious static analysis vulnerability results to triage, and overhead when transmitting, maintaining, and using these images.
To solve this problem, we introduce DYKONDO (DYnamic KONtainer Debloater/Optimizer) and describe our experiences developing and using this tool to transform deployment-specific commercial and governmental containers. DYKONDO takes as input a container and a deployment-specific runtime scenario (e.g., a test suite) and uses dynamic analysis to track exactly the files used during intended deployment executions, removing all other files to reduce bloat. The tooling is designed to be run in continuous integration (CI) environments as a post-build step, rewriting a standard image built by Docker, Podman, or Buildah and automatically filtering files not accessed in the trace data.
We have evaluated DYKONDO on a diverse set of popular open-source images. For instance, DYKONDO reduces the size of Grafana OnCall’s Python-based backend image by 87% when using its supplied test suite. Examining the official images for the popular database PostgreSQL, DYKONDO reduces their sizes by up to 44% using the project’s regression test suite; even the pre-optimized Alpine-based version of the image can be reduced in size by 16%. DYKONDO runtime is typically dominated by tracing via the supplied test suite – we observed the debloating logistics executing in about a minute, in practice.
We found that DYKONDO correctly removes only spurious files: most are utilities included in baseline images for general applicability, but not used for individual intended deployments. For example, the set of debloated files often contains package managers’ files that are not strictly required in resource constrained environments, such as caches, package indexes, and configuration data of considerable size that can be tedious to clean up manually (and thus are often left behind in practice).
Application code comprises the largest remaining portion of our images. As future work, we plan to integrate intra-file debloating strategies for this application code.
DISTRIBUTION STATEMENT A. Approved for public release: distribution unlimited. Approved, DCN# 0543-1381-24.
This material is based upon work supported by the Office of Naval Research under Contract No. N00014-21-C-1032. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Office of Naval Research.