Understanding Docker without losing your shit

Marco Chiappetta
Hipo
Published in
11 min readJul 18, 2019

--

Docker has been around for a while now. Companies use it in production, people write plenty of articles, record video tutorials, yet most developers don’t “trust” it to handle their development environment. Let alone use it in production.

And that’s usually because understanding Docker is hard. This article tries to be a primer that will save you a few hours of screaming and paranoia without leaving out important details.

Why Docker?

Docker allows you to:

  • Wipe differences across environments. That is, your production, staging and development environments will be (should be) identical(ish). The same applies to the development environments of different developers. In other words: no more “works on my machine”.
  • Ensure that the exact same environment can be deployed wherever Docker is installed, no matter the hardware or operating system. Good old “build once, run everywhere”.
  • Build a reproducible environment with a very small footprint (usually just your source code and a special file called Dockerfile) that you can share with others. No more copying 20+ GB virtual machines to onboard new developers.
  • Declare your project’s dependencies as a handy YAML file that can be used to setup everything you need at once (e.g. your app server, database, queues, workers etc.).
  • Create a scalable, reliable cluster that can run your entire project across a fleet of heterogeneous physical machines with a few lines of bash (Docker Swarm).

If you are not impressed yet keep reading.

Why not Docker?

“Vagrant is better than Docker”
– someone who has no idea what Docker is

If you have used virtual machines to setup your development environment you probably used Vagrant to handle them. Vagrant is a front-end to some famous virtualization back-ends such as VMWare or Virtualbox.

Comparing Vagrant with Docker is like comparing apples and beer cans. Irrelevant.

Docker is definitely not the ideal option if your intent is to use containers the same way you would use virtual machines. In that case, Docker will just make your life harder.

This is not to say that Docker cannot do that. It can, it’s just not its primary purpose (nor its secondary for that matter).

For example: when using a virtual machine the flow is quite familiar. You turn on the machine, install some packages, configure something at the OS level, maybe write a few files here and there and, once you are done, suspend or shut down the VM just to restart from the exact same point when you have to work again. Now: all your modifications are saved within the VM’s filesystem/memory. With Docker the user is supposed to stop and start different containers every time, therefore losing the changes local to an individual container. The reason to use different containers every time is to ensure that the provisioning process is easily reproducible and they can be deployed and work anywhere, at anytime, without making assumptions on what the underlying state is. Docker is built to deploy applications, not machines. killdoesn't remember the PID of the last process you sent a signal to, does it?

Another huge difference between using a VM and Docker for development is that with Docker you seldom install all services (e.g. database, web server, key-value store) in the same container but rather split those into separate containers that communicate using a virtual network created by Docker itself. This is exactly what happens in the real world: usually all the services needed to run a project (unless it’s a very small one) run on separate nodes.

The Docker Engine

Before diving into more details it’s important to understand that with “Docker” we could be talking about: the company, the project, the tool. Throughout this article we will refer to either the project or the tool, letting the context drive your intuition.

Docker uses a client/server architecture. The server uses the Docker engine to cache images, run and manage containers, handle logs and much more.

Images, containers…What are you talking about?

Right right. The perfect analogy (for programmers at least) to understand how Docker works is the following:

  • A Dockerfile is the source code.
  • An image is the binary resulting from compiling and building your Dockerfile.
  • A container is the running instance of an image, a process by all means.
  • Images are cached and containers are run by a Docker host (a machine running the Docker engine).
  • Images are stored in a registry, think of it as a repository for package managers such as apt, yum, pip or npm. Docker offers a public registry that anyone can use to store images, as long as these are kept public (one private image is allowed free of charge).

To install Docker on your operating system simply head to https://docker.com and pick the right version.

Let’s get started!

Creating a Dockerfile

Dockerfiles use a simple syntax to express the steps that should be taken to build a specific image. A dead simple Dockerfile is the following:

FROM ubuntu
RUN echo "My first Docker image"

Breaking it down

  • FROM ubuntu tells Docker to use the latest ubuntu image as a base. The image will be retrieved from the public registry.
  • RUN echo "My first Docker image" tells Docker to run the command echo inside the container.

Building an image

You can now build an image from this Dockerfile with:

docker build .

Breaking it down:

  • build tells Docker to build an image.
  • . tells Docker to look in the current directory for the Dockerfile and to use the current directory as a "context" so that we can reference files and directories from there.

You will see an output similar to this:

Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ubuntu
latest: Pulling from library/ubuntu
b3e1c725a85f: Pull complete
4daad8bdde31: Pull complete
63fe8c0068a8: Pull complete
4a70713c436f: Pull complete
bd842a2105a8: Pull complete
Digest: sha256:7a64bc9c8843b0a8c8b8a7e4715b7615e4e1b0d8ca3c7e7a76ec8250899c397a
Status: Downloaded newer image for ubuntu:latest
---> 104bec311bcd
Step 2 : RUN echo "My first Docker image"
---> Running in f85bd2e0f554
My first Docker image
---> 1d4302baa251
Removing intermediate container f85bd2e0f554
Successfully built 1d4302baa251

Breaking it down

  • Docker will copy the directory’s content into a temporary directory and use that as context.
  • Following STEP 1 Docker will retrieve the latest ubuntu image from the public registry and all of its intermediate images (hold that thought!).
  • Following STEP 2 Docker will run the echo command inside the container and route standard output and standard error to our machine so that we can see the result on our terminal.

Now look carefully. Have you noticed something?

My first Docker image
---> 1d4302baa251 <--------- THIS
Removing intermediate container f85bd2e0f554
Successfully built 1d4302baa251 <--------- THIS

Every command, in the Dockerfile, that could potentially alter the state of the image (such as RUN, since Docker cannot know if our command has side effects) produces an "intermediate image". That is, every time such a step is encountered an image will be created that holds the state produced by all the previous commands.

For example (assuming the file test.txt exists):

FROM ubuntu
WORKDIR /tmp
COPY test.txt .
RUN cat test.txt

Would produce 4 intermediate images.

Breaking it down

  • FROM copies the existing ubuntu image into an intermediate image.
  • WORKDIR changes the working directory to the given one.
  • COPY copies the given file into the working directory.
  • RUN runs an arbitrary command.

Therefore the output of a build will be:

Sending build context to Docker daemon 3.072 kB
Step 1 : FROM ubuntu
---> 104bec311bcd
Step 2 : WORKDIR /tmp
---> Using cache
---> 8b7569f87645
Step 3 : COPY test.txt .
---> c515890976fb
Removing intermediate container 7d07b7f6f0fb
Step 4 : RUN cat test.txt
---> Running in 9ec4a66f5a05
I'm the content of test.txt
---> 27922b2708f1
Removing intermediate container 9ec4a66f5a05
Successfully built 27922b2708f1

Where every line starting with an arrow and ending with a hash (that nice little hex string) represents an intermediate image (e.g. ---> c515890976fb).

Running a container

You will notice something else while we are building our image: Removing intermediate container 9ec4a66f5a05. That's right, to execute RUN commands Docker needs to actually instantiate a container with the intermediate image up to that line of the Dockerfile and run the actual command. It will then "commit" the state of the container as a new intermediate image and continue the building process.

As you can see we’ve already run a container. Now we’ll learn how to do that arbitrarily (and no, you don’t need to build an image every time you need to run a container) and how to create an image from a container’s state instead of a Dockerfile.

Take a note of the hash of the image we have just built (27922b2708f1 in my case) and run a container based on that image with:

docker run 27922b2708f1

You’ll notice that the output is blank. The container is, in fact, doing absolutely nothing. Why is that? Why isn’t our RUNcommand being...Well...Run?

The correct Dockerfile instruction to run a command when a container runs is CMD, not RUN which is, instead, executed only at build time. Let's change the Dockerfile as follows:

FROM ubuntu
WORKDIR /tmp
COPY test.txt .
CMD cat test.txt

And build the image once more. You’ll now notice that the output does not contain the content of test.txt anymore. That's because CMD does not run the command, but simply sets the container up to run that command every time it starts. The hash of the final image will also change, since the last step has changed.

To make things easier we can tag our image so that we don't have to remember an ugly hash but assign a nice mnemonic we choose:

docker build -t test .

This will create an image named test with tag latest. The actual "tag" is the pair <NAME>:<VERSION> which, in this case, will be test:latest. A quick look at the cached images will validate this:

docker images

Output:

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
test latest f2aba921b459 2 minutes ago 129 MB

If we now run a container from the aforementioned image with:

docker run test

The output will be:

I'm the content of test.txt

(or whatever you put in your test.txt)

Exactly what we were expecting.

How do I interact with a container though?

Fortunately Docker allows us to do that easily:

docker run -it test bash

Breaking it down

  • run tells Docker to run a container off the given image.
  • -it tells Docker to run the container in interactive mode (as opposed to -d, daemon mode) and to give us a virtual terminal to interact. -i also takes care of wiring the standard input to the container (without which a terminal would be useless).
  • test is the image we've just built.
  • bash is the command to run once the container starts.

The command we pass is what keeps the container alive. Within the container it will be the process with PID 1 which will act as parent of all subsequent processes. This means that if such process is terminated the container will stop.

You will also notice that the content of test.txt is no longer being printed. That happens because if a different command is passed, while running a container, the one from the CMD instruction will be ignored. We are basically overriding the container's default command. To have a default command run no matter what you will have to use the ENTRYPOINT instruction in your Docker file.

Once inside the container our prompt will look like the following:

root@f1e1064e0958:/tmp#

Two things are important:

  • The working directory is /tmp as set with the WORKDIR instruction.
  • The default (and only) user is root. That's because containers are supposed to be treated as stateless processes, not fully-functional virtual machines, therefore there is no need to have users with different privileges. If a container is compromised it's enough to shut it down and spawn another one. Some images might use different users but that is definitely optional.

We will immediately see that the ubuntu image is actually a stripped-down version of the Ubuntu server distribution. Many binaries will be missing:

root@f1e1064e0958:/tmp# ping
bash: ping: command not found

We can use the preinstalled package manager to install ping:

apt update
apt install iputils-ping

To have a feeling of Docker’s statelessness let’s exit the container (exit or Ctrl/Cmd+D) and start a new one:

root@f1e1064e0958:/tmp# exit
➜ docker-article docker run -it test bash
root@ef95ac8b41ff:/tmp#

You will see the hostname has changed from f1e1064e0958 (the ID of the old container) to ef95ac8b41ff (Your IDs will be different). And, to our dismay, ping is also gone.

Creating an image from a container

What if we want to “save” our changes to the container in a new image? Let’s install the iputils-ping package again and exit the container once more.

The docker ps command will show currently running containers. Since all of our containers are stopped the list will be empty. To see all cached containers (including those who were stopped/terminated) simply use:

docker ps -a

My output looks like this:

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                         PORTS               NAMES
ef95ac8b41ff test "bash" 4 minutes ago Exited (127) 1 seconds ago hopeful_keller

Docker kindly assigns a random mnemonic to each new container (hopeful_keller in my case) so that we don't have to remember the ID. We now want to use the state of this cached (stopped) container as a base for a new image we'll call test2:

docker commit hopeful_keller test2

The output will be the SHA256 hash of the newly created image:

sha256:8b976507170c40526d1e0361631e2e959f150c8cee51fe1b9090f2a56b7e9b35

Our new image is now in the Docker engine’s cache and we can see it by listing all images:

docker imagesREPOSITORY          TAG                 IMAGE ID            CREATED              SIZE
test2 latest 8b976507170c About a minute ago 129 MB
test latest f2aba921b459 About an hour ago 129 MB

That’s neat! We have just created a completely new image that we can now reuse to run a container with pingpreinstalled. All in a matter of seconds. Let's run a container off test2 and try pinging something to verify:

docker run -it test2 bash
root@25d285662a5a:/tmp# ping hipolabs.com
PING hipolabs.com (54.83.6.199) 56(84) bytes of data.
64 bytes from ec2-54-83-6-199.compute-1.amazonaws.com (54.83.6.199): icmp_seq=1 ttl=37 time=0.223 ms

As you can see Docker can “kind of” simulate the experience you would have with a virtual machine as long as you remember to commit your changes to an image every time. Alternatively (not suggested) you can keep starting the same container over and over again. From docker ps -a you can get the ID/name of the stopped container and use:

docker start 25d285662a5a
docker exec -it 25d285662a5a bash

To restart the same container with the same state. Docker also provides handy pause and unpause commands to achieve the same result you would have suspending a virtual machine. This goes a bit beyond the scope of this tutorial (and we still have a lot to cover).

The clear disadvantages of creating images this way instead of using a proper Dockerfile are:

  • The image is multiple orders of magnitude bigger than a simple text file. Harder to share.
  • There is no way to reliably document how you built your image. Equivalent to not sharing the source code.

Summary

In this article we saw:

  • How to create a Dockerfile.
  • How to build an image from a Dockerfile.
  • How to start and interact with a container based on that image.
  • How to create a new image from a container’s state.

In the next chapter we will explore two awesome tools: Docker compose and Docker swarm. For questions and comments do not hesitate to get in touch!

Thanks to Fergal Walsh and Semih Basmacı for reading drafts of this article.

Keep an eye on posts from Hipo by subscribing to our newsletter and following us on Twitter.

--

--