created: Thu 07 July 2016; status: published;
While converting the primary application I work on to Docker I went through a series of steps to reduce layers and final image size. Here is the process and results:
- Get the Dockerfile working
- Use Docker tools to understand what was built (layers, size, image tree)
- Progressively refine the Dockerfiles
1. Get the Dockerfile working
Do not optimize during this step! If you are converting an existing deployment script to Docker, keep the mapping simple. Our deployment is a series of Fabric commands that can take a new CentOS 6 server all the way to application running. There are several hundred commands checking for packages and installing anything missing.
The Dockerfile started from a centos6 image and had 100+ RUN commands doing individual wget, yum install, and pip install commands. The result was a 1.2 GB image with 75 layers. But it worked, which is the only place to start.
2. Use Docker tools to understand what was built (layers, size, image tree)
After building a Dockerfile, you can inspect the final image with "docker images"
[cloud-user@carl-kibler-2 base]$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE healthelife/base latest d95c828759dc 5 minutes ago 1.384 GB centos centos6 6a77ab6655b9 3 weeks ago 194.6 MB
And look at the layer history of a specific image with "docker history "
[cloud-user@carl-kibler-2 base]$ docker history healthelife/base IMAGE CREATED CREATED BY SIZE COMMENT d95c828759dc 6 minutes ago /bin/sh -c mkdir /var/log/gunicorn 0 B e8a384406f2a 6 minutes ago /bin/sh -c mkdir /var/log/supervisor 0 B 66fe0e0f64aa 7 minutes ago /bin/sh -c source /usr/bin/virtualenvwrapper. 742.4 kB 5a8378bcfbe5 8 minutes ago /bin/sh -c source /usr/bin/virtualenvwrapper. 7.253 kB b4122716c771 8 minutes ago /bin/sh -c source /usr/bin/virtualenvwrapper. 10.99 MB ... enormous number removed ... 961d89d03917 25 minutes ago /bin/sh -c #(nop) ENV PROJECT_HOME=/opt/djang 0 B c0871bc539c8 26 minutes ago /bin/sh -c python2.7 get-pip.py 10.45 MB d047a96cbf56 26 minutes ago /bin/sh -c wget https://bootstrap.pypa.io/get 1.525 MB 429f4b9090cb 26 minutes ago /bin/sh -c npm install -g --cache /tmp/nodeca 51.87 MB ... SO MANY
You can count them by piping the output to wc:
[cloud-user@carl-kibler-2 base]$ docker history healthelife/base | wc -l 75
There were 75 layers!
The neat part about the full listing is you can see how large individual layers are. This can help you while refining by showing which layers are using the most space and could potentially be removed.
3. Progressively refine the Dockerfiles
I began condensing each group of related commands into single RUN directives without trying to get too clever. For example, 3 commands to yum install some packages became 1 very long command:
RUN yum -y install wget gcc.x86_64 make zlib-devel sqlite-devel gcc-c++ \ libxml2-devel libxslt-devel git dtach sqllite-tcl autoconf libffi-devel \ libGL-devel bind-utils sudo which tar swig rsyslog
Every RUN you remove eliminates a layer. You could get down to 1 layer, but that defeats the purpose of layers anyway. Remember, they are cached. When you make a change in the Dockerfile the cached layers let you skip straight to the change point and only redo the "new" section. Don't reduce RUN commands to the point where every change requires a massive rebuild - that would be deoptimizing your Dockerfile.
For my effort I reduced groups of dependency installs to individual RUN commands:
# Cerner rpm's for python 2.7 RUN wget http://internalrepo/svn/rpm_rhel6/python27-2.7.3-1.cerner.x86_64.rpm RUN wget http://internalrepo/svn/rpm_rhel6/python27-devel-2.7.3-1.cerner.x86_64.rpm RUN wget http://internalrepo/svn/rpm_rhel6/python27-debuginfo-2.7.3-1.cerner.x86_64.rpm RUN yum -y localinstall --nogpgcheck python27-2.7.3-1.cerner.x86_64.rpm python27-devel-2.7.3-1.cerner.x86_64.rpm python27-debuginfo-2.7.3-1.cerner.x86_64.rpm RUN rm python27-2.7.3-1.cerner.x86_64.rpm python27-devel-2.7.3-1.cerner.x86_64.rpm python27-debuginfo-2.7.3-1.cerner.x86_64.rpm
# Cerner rpm's for python 2.7 RUN wget http://internalrepo/svn/rpm_rhel6/python27-2.7.3-1.cerner.x86_64.rpm && \ wget http://internalrepo/svn/rpm_rhel6/python27-devel-2.7.3-1.cerner.x86_64.rpm && \ wget http://internalrepo/svn/rpm_rhel6/python27-debuginfo-2.7.3-1.cerner.x86_64.rpm && \ yum -y localinstall --nogpgcheck python27-2.7.3-1.cerner.x86_64.rpm python27-devel-2.7.3-1.cerner.x86_64.rpm python27-debuginfo-2.7.3-1.cerner.x86_64.rpm && \ rm python27-2.7.3-1.cerner.x86_64.rpm python27-devel-2.7.3-1.cerner.x86_64.rpm python27-debuginfo-2.7.3-1.cerner.x86_64.rpm
The pattern is:
RUN download things && \ install things && \ delete downloaded things
If done in a single RUN command the layer has a clear, specific image change and size is minimized. Remember that when one layer adds a file and a separate layer removes it, the net size is not reduced. The layers accumulate.
By grouping related RUN commands the number of layers is down to 23. Coalescing them further would start to make it a mess, so I left it there.