How to Not Be the Engineer Running 3.5GB Docker Images

Let’s cut to the chase: you’re adopting a microservice architecture and you’re planning to use Docker. There’s a reason it is so en vogue – it solves lots and lots of problems and has zero negative effect on our projects, right? Right?

As with every tool, technology, or paradigm that is thrust upon us as we scrappily try to maintain our sanity while jumping from shiny to shiny; we need to learn the gotchas.

To do this, I like to start with a simple question: How might this new shiny bite me on the ass, and what can I do to avoid having teeth marks in my rear?

I want to tackle a problem that I have seen time and time again during my consultations with teams / organisations adopting Docker.

Behemoth Docker Images

$ docker images
REPOSITORY              TAG      IMAGE ID       CREATED              SIZE
awesome-micro-service   latest   61562a134d38   About a minute ago   3.5 GB

Woah! Look at the size of that image. awesome-micro-service is 3.5GB! So much for micro.

I’m sure you’ve seen this … I’m sure you’ve contributed to this. It’s OK. Everybody falls the first time.

What on earth is a Docker image anyways?

In order to understand why our images are big, we need to understand what images are in the first place.

A Docker image is the output of a docker build. The build process runs each of the instructions within a Dockerfile. Each instruction executed creates a layer. Layers encapsulate the file system changes that the instruction has caused. A Docker image is a collection of layers.

Let’s look closer so we can describe a Docker image in more detail.

Example:

Assume we’re going to bring Docker into our PHP workflow. In order to run our PHP application, we need a Debian-based system with PHP installed.

We’ll need to describe the environment required to run our application within a Docker container.

# Dockerfile
FROM debian:jessie

RUN echo "Building ..."
RUN DEBIAN_FRONTEND=noninteractive apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get install php5-cli

Super simple. Super declarative. Super awesome. Though completely useless until we build it. The build process takes a Dockerfile and context and produces a Docker image.

The context is the directory that will be sent to the Dockerfile to satisfy any file requirements, such as ADD or COPY commands, etc.

# docker build -t -f 
# If the Dockerfile is within the root of our context, we can omit the -f

$ docker build -t my-debian-php:latest -f Dockerfile .
...
$ docker images
REPOSITORY      TAG      IMAGE ID       CREATED              SIZE
my-docker-php   latest   61562a134d38   About a minute ago   163.5 MB

So what’s actually going on? What’s inside my Docker image?

It’s a file system. Nothing fancy. When you run an apt-get install vim, all you’re telling the computer to do is put some files on your hard drive. The Docker image encapsulates that and keeps track of all new / modified / deleted files.

These file system changes are tracked in layers. Each layer is the the encapsulation of the file system changes for each instruction in your Dockerfile.

Docker provides a command to visualise our Docker images. As you’ll see in the output below:

  1. We have no control over the size of our base image, other than changing base image.
    This is the “” layer at the bottom of the list.
  2. Some keywords cost us nothing. Examples include CMD, USER, WORKDIR, etc.
$ docker history my-docker-php
IMAGE          CREATED           CREATED BY                                        SIZE      COMMENT
b4e7e4004eeb   4 seconds ago     /bin/sh -c #(nop) CMD ["vim"]                     0 B
d2a8ad35f9f4   4 seconds ago     /bin/sh -c echo                                   0 B
6fc559885751   36 minutes ago    /bin/sh -c DEBIAN_FRONTEND=noninteractive apt     38.37 MB
f50f9524513f   8 weeks ago       /bin/sh -c #(nop) CMD ["/bin/bash"]               0 B
               8 weeks ago       /bin/sh -c #(nop) ADD file:b5391cb13172fb513d     125.1 MB

Note: If your command makes no changes to the file-system (Like our RUN echo “Building …”), a layer is still created. It just has a zero-byte size.

So in-order to keep our images micro, we need to keep the output of our layers to a minimum

Gotcha’s

1. File Ownership & Permissions

Never, and I mean it, never change the ownership or permissions of a file inside a Dockerfile unless you absolutely NEED to. When you do NEED to, try to modify as few files as possible.

Although comparisons can be made, Docker isn’t like Git. It doesn’t know what changes have happened inside your layer, only which files are affected. As such, this will cause Docker to create a new layer, replicating / replacing the files. This can potentially cause your image to double in size if you’re modifying particularly large files, or worse, every file!

Example:

# Dockerfile
FROM debian:jessie

ADD large_file /var/wwwlarge_file
RUN chown www-data /var/www/large_file
RUN chmod 756 /var/www/large_file

$ docker build -t gotcha-1 .
...

$ docker images gotcha-1
REPOSITORY TAG      IMAGE ID       CREATED              SIZE
gotcha-1   latest   49b4a4ea228a   About a minute ago   3.346 GB

$ docker history gotcha-1
IMAGE          CREATED          CREATED BY                                      SIZE        COMMENT
49b4a4ea228a   36 seconds ago   /bin/sh -c chmod 756 /var/www/large_file        1.074 GB
09d77316932b   2 minutes ago    /bin/sh -c chown www-data /var/www/large_file   1.074 GB
7adb7c72c3ef   2 minutes ago    /bin/sh -c #(nop) ADD file:a86f6dedfb4ba54972   1.074 GB
f50f9524513f   8 weeks ago      /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B
               8 weeks ago      /bin/sh -c #(nop) ADD file:b5391cb13172fb513d   125.1 MB

 

Tip: If you’re having problems with permissions inside your container, modify them using your entrypoint script, or modify the user id to reflect what you need. Do not modify the files.

Example

Changing the user-id of www-data to match yours. Tweak as necessary:

RUN usermod -u 1000 www-data

Or run your container with an entrypoint script:

$ cat my-script
#!/bin/bash
chown www-data -R /var/www/
apache2

$ docker run my-debian-php --entrypoint=/bin/my-script

2. Clean up after untidy commands

Sometimes other commands leave a trail of garbage at their sides and couldn’t care about the size of your images. We accept this on our desktops and preach “cache” and “performance”. Inside our images, it’s just pure filth.

Example:

# Dockerfile
FROM debian:jessie

RUN DEBIAN_FRONTEND=noninteractive apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y vim
$ docker build -t debian .
...
$ docker history debian
IMAGE           CREATED             CREATED BY                                      SIZE       COMMENT
ae5a25410c0d    10 seconds ago      /bin/sh -c DEBIAN_FRONTEND=noninteractive apt   28.68 MB
aaf5660234d3    21 minutes ago      /bin/sh -c DEBIAN_FRONTEND=noninteractive apt   9.694 MB
f50f9524513f    8 weeks ago         /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B
                8 weeks ago         /bin/sh -c #(nop) ADD file:b5391cb13172fb513d   125.1 MB

As you can see from the output above, our apt-get update costs us about 10MB and out apt-get install costs us about 30MB. Obviously these are trivial examples, but in larger builds this space will accumulate!

First, lets examine and see what each command is doing to our image. To do this, create an interactive Docker image and bash in:

$ docker run -ti --rm --name live debian:jessie bash

You’ll be live inside the innards of a Debian container and at a bash prompt. Next, let’s get a second terminal window open and inspect the container:

$ docker diff live
$

No output. That’s good, because we’ve not done anything yet. docker diff allows us to see what’s changed inside our container. So lets run our first command:

Note: “$ ” is my local prompt and “root@4552beab7001:/#” is inside the container.

root@4552beab7001:/# apt-get update
$ docker diff live
C /var
C /var/lib
C /var/lib/apt
C /var/lib/apt/lists
A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie_main_binary-amd64_Packages.gz
A /var/lib/apt/lists/security.debian.org_dists_jessie_updates_main_binary-amd64_Packages.gz
A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie-updates_main_binary-amd64_Packages.gz
A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie_Release
A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie_Release.gpg
A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie-updates_InRelease
A /var/lib/apt/lists/lock
A /var/lib/apt/lists/security.debian.org_dists_jessie_updates_InRelease

Oooh, we’ve just discovered where our 10MB is going. Lets fix it by tweaking our Dockerfile to delete our apt cache after installing vim. Your initial thought may be to tweak as:

# Dockerfile
FROM debian:jessie

RUN DEBIAN_FRONTEND=noninteractive apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y vim
RUN rm -rf /var/lib/apt

Unfortunately, this will only add another layer and not affect the previous layers. So although we’re deleting files, the previous layer still has knowledge of them. The common trick in is to chain our commands at the shell level. This way, the files don’t exist when the RUN is finished and they never exist in our history.

# Dockerfile
FROM debian:jessie

RUN DEBIAN_FRONTEND=noninteractive apt-get update \
&& apt-get install -y vim \
&& rm -rf /var/lib/apt
$ docker history debian
IMAGE          CREATED          CREATED BY                                       SIZE      COMMENT
be6afc32bd37   5 seconds ago    /bin/sh -c DEBIAN_FRONTEND=noninteractive apt    28.68 MB
f50f9524513f   8 weeks ago      /bin/sh -c #(nop) CMD ["/bin/bash"]              0 B
               8 weeks ago      /bin/sh -c #(nop) ADD file:b5391cb13172fb513d    125.1 MB

Much better 🙂 You can repeat that process for every RUN inside your Dockerfile and really cut
the fat out of your image. Just be careful not to delete anything that you need!

Tips

Tip #1.

Create and maintain your own base images, preferably on Alpine! Alpine Linux is tiny (Under 5MB!) and has a really strong package manager. If you can, use it and keep your base images lean.

Why is creating / maintaining your own base image ideal? Most “official” images are quite bloated and try to be as general as possible. You know what you need. It’s like compiling your own kernel, only not as dangerous 😀

Tip #2.

ONBUILD. Use it. When crafting base images, ONBUILD gives you a great way to reuse this image for both development and production. ONBUILD tells Docker that when the image is used as a base, we should perform some extra instructions, such as the following, which puts our code into the container for a production build.

ONBUILD ADD . /var/www

As this only runs when being used as a base, our docker-compose.yml, used for development, can instead mount a volume into the container, for getting our code changes into the container without a rebuild 🙂

services:
application:
image: my-base
volumes:
- .:/var/www

Tip #3.

Be careful using community images. They disappear. Often. Fork and maintain your own if it’s mission critical. You’re also putting your trust in the maintainer to protect your attach surface, but that’s a security issue and another post for next time.

  • Severin Pappadeux

    Ought to use `apt-get install -y nano &&
    apt-get autoremove; apt-get clean; apt-get autoclean` instead of `apt-get install -y nano && rm -rf /var/lib/apt`

    • Hi @severinpappadeux:disqus, thanks for the comment. `autoremove` doesn’t offer us any value in a Docker file, but `clean` is nice shorthand for cleaning up after apt commands, definitely. I was trying to keep my point more generic and show that we can simply `rm` away any waste, though perhaps a non-apt command would have highlighted this more succinctly 🙂

  • Pingback: Integrating into Qualtrics: Docker Deployment | Stacks and Q's()

  • I really loved this post. Reminded of me of a few mistakes I have made …
    “to protect your attach surface” that might meant to be “attack surface”.

  • cleverlzc

    nice~

  • Ryan Smith

    Please use multistage builds into FROM Scratch containers. Here are the official docs:
    https://docs.docker.com/v17.09/engine/userguide/eng-image/multistage-build/

  • Igor Kostin

    Great article! Helped us to keep our footprint at minimum size