Optimizing a Dockerfile to reduce layers and size

tags: Docker
created: Thu 07 July 2016; status: published;

While converting the primary application I work on to Docker I went through a series of steps to reduce layers and final image size. Here is the process and results:

Get the Dockerfile working
Use Docker tools to understand what was built (layers, size, image tree)
Progressively refine the Dockerfiles

1. Get the Dockerfile working

Do not optimize during this step! If you are converting an existing deployment script to Docker, keep the mapping simple. Our deployment is a series of Fabric commands that can take a new CentOS 6 server all the way to application running. There are several hundred commands checking for packages and installing anything missing.

The Dockerfile started from a centos6 image and had 100+ RUN commands doing individual wget, yum install, and pip install commands. The result was a 1.2 GB image with 75 layers. But it worked, which is the only place to start.

2. Use Docker tools to understand what was built (layers, size, image tree)

After building a Dockerfile, you can inspect the final image with "docker images"

[cloud-user@carl-kibler-2 base]$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

healthelife/base    latest              d95c828759dc        5 minutes ago       1.384 GB
centos              centos6             6a77ab6655b9        3 weeks ago         194.6 MB

And look at the layer history of a specific image with "docker history "

[cloud-user@carl-kibler-2 base]$ docker history healthelife/base

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
d95c828759dc        6 minutes ago       /bin/sh -c mkdir /var/log/gunicorn              0 B
e8a384406f2a        6 minutes ago       /bin/sh -c mkdir /var/log/supervisor            0 B
66fe0e0f64aa        7 minutes ago       /bin/sh -c source /usr/bin/virtualenvwrapper.   742.4 kB
5a8378bcfbe5        8 minutes ago       /bin/sh -c source /usr/bin/virtualenvwrapper.   7.253 kB
b4122716c771        8 minutes ago       /bin/sh -c source /usr/bin/virtualenvwrapper.   10.99 MB
                    ... enormous number removed ...
961d89d03917        25 minutes ago      /bin/sh -c #(nop) ENV PROJECT_HOME=/opt/djang   0 B
c0871bc539c8        26 minutes ago      /bin/sh -c python2.7 get-pip.py                 10.45 MB
d047a96cbf56        26 minutes ago      /bin/sh -c wget https://bootstrap.pypa.io/get   1.525 MB
429f4b9090cb        26 minutes ago      /bin/sh -c npm install -g --cache /tmp/nodeca   51.87 MB
                    ... SO MANY

You can count them by piping the output to wc:

[cloud-user@carl-kibler-2 base]$ docker history healthelife/base | wc -l
75

There were 75 layers!

The neat part about the full listing is you can see how large individual layers are. This can help you while refining by showing which layers are using the most space and could potentially be removed.

3. Progressively refine the Dockerfiles

I began condensing each group of related commands into single RUN directives without trying to get too clever. For example, 3 commands to yum install some packages became 1 very long command:

RUN yum -y install wget gcc.x86_64 make zlib-devel sqlite-devel gcc-c++ \
    libxml2-devel libxslt-devel git dtach sqllite-tcl autoconf libffi-devel \
    libGL-devel bind-utils sudo which tar swig rsyslog

Every RUN you remove eliminates a layer. You could get down to 1 layer, but that defeats the purpose of layers anyway. Remember, they are cached. When you make a change in the Dockerfile the cached layers let you skip straight to the change point and only redo the "new" section. Don't reduce RUN commands to the point where every change requires a massive rebuild - that would be deoptimizing your Dockerfile.

For my effort I reduced groups of dependency installs to individual RUN commands:

# Cerner rpm's for python 2.7
RUN wget http://internalrepo/svn/rpm_rhel6/python27-2.7.3-1.cerner.x86_64.rpm
RUN wget http://internalrepo/svn/rpm_rhel6/python27-devel-2.7.3-1.cerner.x86_64.rpm
RUN wget http://internalrepo/svn/rpm_rhel6/python27-debuginfo-2.7.3-1.cerner.x86_64.rpm
RUN yum -y localinstall --nogpgcheck python27-2.7.3-1.cerner.x86_64.rpm python27-devel-2.7.3-1.cerner.x86_64.rpm python27-debuginfo-2.7.3-1.cerner.x86_64.rpm
RUN rm python27-2.7.3-1.cerner.x86_64.rpm python27-devel-2.7.3-1.cerner.x86_64.rpm python27-debuginfo-2.7.3-1.cerner.x86_64.rpm

became:

# Cerner rpm's for python 2.7
RUN wget http://internalrepo/svn/rpm_rhel6/python27-2.7.3-1.cerner.x86_64.rpm && \
    wget http://internalrepo/svn/rpm_rhel6/python27-devel-2.7.3-1.cerner.x86_64.rpm && \
    wget http://internalrepo/svn/rpm_rhel6/python27-debuginfo-2.7.3-1.cerner.x86_64.rpm && \
    yum -y localinstall --nogpgcheck python27-2.7.3-1.cerner.x86_64.rpm python27-devel-2.7.3-1.cerner.x86_64.rpm python27-debuginfo-2.7.3-1.cerner.x86_64.rpm && \
    rm python27-2.7.3-1.cerner.x86_64.rpm python27-devel-2.7.3-1.cerner.x86_64.rpm python27-debuginfo-2.7.3-1.cerner.x86_64.rpm

The pattern is:

RUN download things && \
    install things && \
    delete downloaded things

If done in a single RUN command the layer has a clear, specific image change and size is minimized. Remember that when one layer adds a file and a separate layer removes it, the net size is not reduced. The layers accumulate.

By grouping related RUN commands the number of layers is down to 23. Coalescing them further would start to make it a mess, so I left it there.