Let’s cut to the chase: you’re adopting a microservice architecture and you’re planning to use Docker. There’s a reason it is so en vogue – it solves lots and lots of problems and has zero negative effect on our projects, right? Right?
As with every tool, technology, or paradigm that is thrust upon us as we scrappily try to maintain our sanity while jumping from shiny to shiny; we need to learn the gotchas.
To do this, I like to start with a simple question: How might this new shiny bite me on the ass, and what can I do to avoid having teeth marks in my rear?
I want to tackle a problem that I have seen time and time again during my consultations with teams / organisations adopting Docker.
Behemoth Docker Images
$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE awesome-micro-service latest 61562a134d38 About a minute ago 3.5 GB
Woah! Look at the size of that image.
awesome-micro-service is 3.5GB! So much for micro.
I’m sure you’ve seen this … I’m sure you’ve contributed to this. It’s OK. Everybody falls the first time.
What on earth is a Docker image anyways?
In order to understand why our images are big, we need to understand what images are in the first place.
A Docker image is the output of a
docker build. The build process runs each of the instructions within a
Dockerfile. Each instruction executed creates a layer. Layers encapsulate the file system changes that the instruction has caused. A Docker image is a collection of layers.
Let’s look closer so we can describe a Docker image in more detail.
Assume we’re going to bring Docker into our PHP workflow. In order to run our PHP application, we need a Debian-based system with PHP installed.
We’ll need to describe the environment required to run our application within a Docker container.
# Dockerfile FROM debian:jessie RUN echo "Building ..." RUN DEBIAN_FRONTEND=noninteractive apt-get update RUN DEBIAN_FRONTEND=noninteractive apt-get install php5-cli
Super simple. Super declarative. Super awesome. Though completely useless until we build it. The build process takes a
context and produces a
context is the directory that will be sent to the Dockerfile to satisfy any file requirements, such as
COPY commands, etc.
# docker build -t -f # If the Dockerfile is within the root of our context, we can omit the -f $ docker build -t my-debian-php:latest -f Dockerfile . ... $ docker images REPOSITORY TAG IMAGE ID CREATED SIZE my-docker-php latest 61562a134d38 About a minute ago 163.5 MB
So what’s actually going on? What’s inside my Docker image?
It’s a file system. Nothing fancy. When you run an
apt-get install vim, all you’re telling the computer to do is put some files on your hard drive. The Docker image encapsulates that and keeps track of all new / modified / deleted files.
These file system changes are tracked in layers. Each layer is the the encapsulation of the file system changes for each instruction in your Dockerfile.
Docker provides a command to visualise our Docker images. As you’ll see in the output below:
- We have no control over the size of our base image, other than changing base image.
This is the “” layer at the bottom of the list.
- Some keywords cost us nothing. Examples include CMD, USER, WORKDIR, etc.
$ docker history my-docker-php IMAGE CREATED CREATED BY SIZE COMMENT b4e7e4004eeb 4 seconds ago /bin/sh -c #(nop) CMD ["vim"] 0 B d2a8ad35f9f4 4 seconds ago /bin/sh -c echo 0 B 6fc559885751 36 minutes ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 38.37 MB f50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B 8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB
Note: If your command makes no changes to the file-system (Like our RUN echo “Building …”), a layer is still created. It just has a zero-byte size.
So in-order to keep our images micro, we need to keep the output of our layers to a minimum
1. File Ownership & Permissions
Never, and I mean it, never change the ownership or permissions of a file inside a Dockerfile unless you absolutely NEED to. When you do NEED to, try to modify as few files as possible.
Although comparisons can be made, Docker isn’t like Git. It doesn’t know what changes have happened inside your layer, only which files are affected. As such, this will cause Docker to create a new layer, replicating / replacing the files. This can potentially cause your image to double in size if you’re modifying particularly large files, or worse, every file!
# Dockerfile FROM debian:jessie ADD large_file /var/wwwlarge_file RUN chown www-data /var/www/large_file RUN chmod 756 /var/www/large_file
$ docker build -t gotcha-1 . ... $ docker images gotcha-1 REPOSITORY TAG IMAGE ID CREATED SIZE gotcha-1 latest 49b4a4ea228a About a minute ago 3.346 GB $ docker history gotcha-1 IMAGE CREATED CREATED BY SIZE COMMENT 49b4a4ea228a 36 seconds ago /bin/sh -c chmod 756 /var/www/large_file 1.074 GB 09d77316932b 2 minutes ago /bin/sh -c chown www-data /var/www/large_file 1.074 GB 7adb7c72c3ef 2 minutes ago /bin/sh -c #(nop) ADD file:a86f6dedfb4ba54972 1.074 GB f50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B 8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB
Tip: If you’re having problems with permissions inside your container, modify them using your entrypoint script, or modify the user id to reflect what you need. Do not modify the files.
Changing the user-id of
www-data to match yours. Tweak as necessary:
RUN usermod -u 1000 www-data
Or run your container with an entrypoint script:
$ cat my-script #!/bin/bash chown www-data -R /var/www/ apache2 $ docker run my-debian-php --entrypoint=/bin/my-script
2. Clean up after untidy commands
Sometimes other commands leave a trail of garbage at their sides and couldn’t care about the size of your images. We accept this on our desktops and preach “cache” and “performance”. Inside our images, it’s just pure filth.
# Dockerfile FROM debian:jessie RUN DEBIAN_FRONTEND=noninteractive apt-get update RUN DEBIAN_FRONTEND=noninteractive apt-get install -y vim
$ docker build -t debian . ... $ docker history debian IMAGE CREATED CREATED BY SIZE COMMENT ae5a25410c0d 10 seconds ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 28.68 MB aaf5660234d3 21 minutes ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 9.694 MB f50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B 8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB
As you can see from the output above, our
apt-get update costs us about 10MB and out
apt-get install costs us about 30MB. Obviously these are trivial examples, but in larger builds this space will accumulate!
First, lets examine and see what each command is doing to our image. To do this, create an interactive Docker image and bash in:
$ docker run -ti --rm --name live debian:jessie bash
You’ll be live inside the innards of a Debian container and at a bash prompt. Next, let’s get a second terminal window open and inspect the container:
$ docker diff live $
No output. That’s good, because we’ve not done anything yet.
docker diff allows us to see what’s changed inside our container. So lets run our first command:
Note: “$ ” is my local prompt and “root@4552beab7001:/#” is inside the container.
root@4552beab7001:/# apt-get update
$ docker diff live C /var C /var/lib C /var/lib/apt C /var/lib/apt/lists A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie_main_binary-amd64_Packages.gz A /var/lib/apt/lists/security.debian.org_dists_jessie_updates_main_binary-amd64_Packages.gz A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie-updates_main_binary-amd64_Packages.gz A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie_Release A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie_Release.gpg A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie-updates_InRelease A /var/lib/apt/lists/lock A /var/lib/apt/lists/security.debian.org_dists_jessie_updates_InRelease
Oooh, we’ve just discovered where our 10MB is going. Lets fix it by tweaking our Dockerfile to delete our
apt cache after installing vim. Your initial thought may be to tweak as:
# Dockerfile FROM debian:jessie RUN DEBIAN_FRONTEND=noninteractive apt-get update RUN DEBIAN_FRONTEND=noninteractive apt-get install -y vim RUN rm -rf /var/lib/apt
Unfortunately, this will only add another layer and not affect the previous layers. So although we’re deleting files, the previous layer still has knowledge of them. The common trick in is to chain our commands at the shell level. This way, the files don’t exist when the
RUN is finished and they never exist in our history.
# Dockerfile FROM debian:jessie RUN DEBIAN_FRONTEND=noninteractive apt-get update \ && apt-get install -y vim \ && rm -rf /var/lib/apt
$ docker history debian IMAGE CREATED CREATED BY SIZE COMMENT be6afc32bd37 5 seconds ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 28.68 MB f50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B 8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB
Much better 🙂 You can repeat that process for every
RUN inside your Dockerfile and really cut
the fat out of your image. Just be careful not to delete anything that you need!
Create and maintain your own base images, preferably on Alpine! Alpine Linux is tiny (Under 5MB!) and has a really strong package manager. If you can, use it and keep your base images lean.
Why is creating / maintaining your own base image ideal? Most “official” images are quite bloated and try to be as general as possible. You know what you need. It’s like compiling your own kernel, only not as dangerous 😀
ONBUILD. Use it. When crafting base images,
ONBUILD gives you a great way to reuse this image for both development and production.
ONBUILD tells Docker that when the image is used as a base, we should perform some extra instructions, such as the following, which puts our code into the container for a production build.
ONBUILD ADD . /var/www
As this only runs when being used as a base, our
docker-compose.yml, used for development, can instead mount a volume into the container, for getting our code changes into the container without a rebuild 🙂
services: application: image: my-base volumes: - .:/var/www
Be careful using community images. They disappear. Often. Fork and maintain your own if it’s mission critical. You’re also putting your trust in the maintainer to protect your attach surface, but that’s a security issue and another post for next time.