Docker

Docker? I Hardly Know ’Er
Software
Engineering
DataEngineering
Author

Gurpreet Johl

Published

June 5, 2025

1. Docker Overview

1.1. What is Docker?

Docker is a tool for creating and managing containers.

A container is a standardised unit of software - a package of code and the dependencies required to run the code. The same container always yields the exact same application and execution behaviour.

Support for containers is built into all modern operating systems. Docker simplifies the creation and management of containers. Containers are a general idea, Docker is just the de facto standard tool for dealing with containers.

1.2. Why Use Containers?

Why would we want independent, standardised, standalone applications packages?

  • We may have different dev vs prod environments. We want to build and test in the same way.
  • Multiple developers keeping their development environments in sync with each other.
  • Clashing tools and versions between projects.

1.3. Virtual Machines vs Docker Containers

With a virtual machine (VM), we have:

  • Our operating system
  • A VM with virtual OS
  • Libraries and dependencies
  • Our own app

This does allow separated environments which environment-specific configurations which can be shared and reproduced. However, it involves a lot of duplication, can be slow with long boot times and reproducing on another server is possible but can be tricky.

There is a big overhead with VMs, because we recreate everything including the (virtual) OS every time we want a new VM. This wastes a lot of storage and tends to be slow.

A container is a lightweight solution to the same problem. The Docker Engine interfaces with the OS, and the container sits on top of the Docker engine. This means each container only needs to contain its own code + data, it doesn’t need to reproduce the OS or any additional utlities.

  • OS
  • OS built-in container support
  • Docker engine
  • Container - App, libraries

The benefits of a container compared to a VM:

  • Low impact on OS, minimal storage usage
  • Sharing and rebuilding is easy
  • Encapsulate apps and environments rather than whole machines

1.4. Docker Setup

The specific steps depend on the OS, so check the docs.

For Linux, install Docker Engine. For Mac and Windows, install Docker Desktop (or Docker Toolbox if requirements not met).

Docker playground is helpful to try things out in a browser with no installation required.

Docker Hub (my second favourite *hub.com on the internet*) is a centralised repository that lets us share containers.

Docker compose and Kubernetes help manage multi-container apps.

* Number one is github, you dirty dog.

1.5. Overview of a Container

The Dockerfile file (no extension) contains the list of commands to create our image.

To build the container run this in a terminal in the same directory as the Dockerfile:

docker build .

To run the container:

docker run <image_id>

Optionally, if we want to expose port 3000 inside the container to the outside world, run:

docker run -p 3000:3000 <image_id>

Some other useful basic commands are listing running containers:

docker ps -a

And stopping a container:

docker stop <container_name>

2. Images and Containers

Docker image and container cheat sheet.

2.1. Image vs Container

A container is a running “unit of software”. An image is the blueprint for the container.

So we can define an image then create containers running in multiple places. The image contains the code and environment, the container is the running app.

2.2. Using Prebuilt Images

We can either use a prebuilt image or one that we’ve created ourselves.

Docker Hub is useful for finding and sharing prebuilt images.

To run a prebuilt image from docker hub, node in this case, run this in a terminal:

docker run node

This will automatically pull the prebuilt node image if it hasn’t already been pulled, then run it.

By default, the container is isolated, but we can run it in interactive mode if we want to use the REPL.

docker run -it node

We can use exec to interact with the shell terminal of an already running container:

docker exec -it <container_name> sh

2.3. Building Our Own Image

We can build our own images; typically these will layer on top of other prebuilt images. The Dockerfile file configures the image.

We typically start with FROM to build on some base image, either on your local system or on DockerHub.

FROM node

We then want to tell Docker which files on the local machine should go in the image. Often we want to COPY the contents of the current directory to the container file system, for example in a directory called app.

COPY . /app

The WORKDIR command tells docker that any subsequent commands we RUN should run from this directory. So instead of the above copy with absolute filepath, we could do:

WORKDIR /app
COPY . .

We then want to RUN a command inside the image, for example, to install dependencies.

RUN npm install

The command instruction, CMD, specifies code to execute when the container is started. Contrast this with image** is created. An example of the distinction is starting a server, which we do in the container. The syntax is a bit odd, instead of CMD node server.js we need to pass the command as an array of strings.

CMD [“node”, “server.js”]

If you don’t specify a CMD, the CMD of the base image will be executed. It should be the last instruction in a Dockerfile.

The web server listens on a particular port, say port 80. The container is isolated, so we need to EXPOSE the port.

EXPOSE 80

This is best practice to explicitly declare any used ports, but it is optional and does not actually make the port available. For this, we need to specify the port with the -p flag when creating the container with docker run. See the next section.

2.4. Running a Container Based On Our Image

We have a complete Dockerfile, next we need to build the image. In a terminal in the same directory as the Dockerfile, run:

docker build .

This outputs an image ID. We can then run this image. This blocks the terminal.

docker run <image_id>

We need the “publish” -p flag to make the port in the container accessible. Say we want to access the application through a local port 3000, and we are accessing the internal port 80 inside the container.

docker run -p 3000:80 <image_id>

We can then see our app if we visit localhost:3000 in a browser.

2.5. Images Are Read-Only

If we are changing the code inside the image, we need to build the image again and then run a new container from the new image.

The contents of an image are set at the build step and cannot be altered afterwards.

2.6. Understanding Image Layers

Images are layer-based. This means that if there is a change to the image and the image is rebuilt, it will cache previous layers and if the layer is unchanged, it will use the cached layer. It will only recreate layers where there was a change (and all subsequent layers).

This means we can optimise our Dockerfile by copying code which changes frequently near the end, and code which is relatively static, like dependencies, should go earlier. So when we have a code change, we don’t need to reinstall everything.

2.7. Managing Images and Containers

The following commands are useful when managing images and containers.

Images:

  • Tag: -t
  • List: docker images
  • Analyse: docker image inspect <image_id>
  • Remove: docker rmi <image_id>; docker prune

Containers:

  • Name: -—name
  • List: docker ps
  • Remove: docker rm

2.8. Stopping and Restarting Containers

Start a new container with docker run.

If nothing in our image has changed, we may just want to restart a previously run container:

docker start <container_name>

2.9. Attached and Detached Containers

An attached container is running and blocks the terminal. For example, when we use docker run. Attached simply means we are “listening” to the output of the container, which can be helpful if we want to look at the logs.

A detached container is running but runs in the background without blocking the terminal. For example, when we run docker start.

We can configure whether to run in attached or detached mode. The -d flag runs in detached mode:

docker run -d <image_id>

The -a flag runs in attached mode:

docker start -a <container_name>

We can attach ourselves to a detached container using:

docker attach <container_name>

To see the log history, we can use

docker logs <container_name>

We can run this in “follow” mode with the -f flag to print the logs and keep listening (i.e. attach to the container)

docker logs -f <container_name>

2.10 Interactive Mode

If we have an application that requires user input, or we just want to see what’s going on inside the container, we can run in interactive mode.

We do this with the interactive flag -i. This keeps STDIN open even if not attached. This is analogous to how the attach flag -a keeps STDOUT open.

We usually combine this with the TTY flag -t. This allocates a pseudo-TTY, which exposes a terminal we can use.

docker run -it <image_id>

2.11. Deleting Containers

We can list all containers with:

docker ps

We can remove containers with:

docker rm <container_name>

We can pass multiple containers to remove separated by a space.

We can’t remove a running container, so we need to stop it first:

docker stop <container_name>

We can remove ALL stopped containers with

docker container prune

2.12 Deleting Images

Analogously to containers, we can list images with:

docker images 

We remove images with:

docker rmi <image_id>

We can pass multiple images to remove separated by a space.

We can’t remove images which are being used by containers, even if that container is stopped. The container needs to be removed first before the image can be removed.

We can remove all unused images with:

docker image prune

This will remove all untagged images. To also remove tagged images we need the -a flag

docker image prune -a

We can pass the --rm flag to the docker run command to automatically remove the container once stopped.

2.12 Inspecting Images

We can understand more about an image by inspecting it:

docker image inspect <image_id>

This tells us:

  • ID
  • Creation date
  • Config such as ports, entry point, interactive, etc
  • Docker version
  • OS
  • Layers

2.13 Copying Files Into and From a Running Container

We can copy files in and out of a running container with the cp command.

To copy a file into a container:

docker cp <source file or folder> <container_name>:/<path in container>

To copy out of a container, just swap the order of the arguments:

docker cp <container_name>:/<path in container> <source file or folder>

This allows us to add config or data to a running container, or copy results or logs out of the container.

2.14. Names and Tags

We can add --name and --tag in the docker run command to specify our own name and tag for a container.

For images, when we execute docker build we can specify a tag with --tag or -t. Image tags consist of name:tag e.g. python:3.11 or node:latest.

2.15. Sharing Images

2.15.1. Registries

Everyone who has an image can create a container from it; we are sharing images not containers.

We could share the source code and Dockerfile to allow somebody to build the image themselves.

Alternatively, we can just share the built image. No build is required and we don’t need to share the full source code.

We can share in:

  • Docker Hub
  • Any private registry

2.15.2. Push

We can push to and pull from the registry with the following commands. By default these refer to Docker Hub.

docker push <image_name>
docker pull <image_name>

We pass host:name rather than just an image name if we want to push or pull from a private registry.

Steps:

  1. Create the image in Docker Hub.
  2. Create an image on our local machine with the same image name.
  3. Push with the docker push command above.

We can re-tag an existing image with:

docker tag <old_image_name> <new_image_name>

This clones the image; the old name still exists.

We need to login to be able to push to a docker hub repo:

docker login 

We can also logout with:

docker logout

2.15.3. Pull

docker pull <image_name>

This will pull the latest version by default. You can specify a tag to override this.

We can execute run and it will automatically download the image if you don’t already have it locally. If you do have it locally, you might have an out of date version and docker run won’t tell you this.

docker run <image_name>

2.16. Summary

  • Images (blueprints) vs Containers (the running instance of the app).
  • Images can be pulled or built.
  • The Dockerfile configures the image when building our own images.
  • Images are built in layers.
  • We can run containers from an image, and configure some helpful flags when running.
  • We can manage containers: list, remove, stop, start
  • We can manage images: list, remove, push, pull

3. Managing Data in Containers

Docker volumes cheat sheet.

3.1. Different Categories of Data

  • Application. Code and environment. Added to the image in the build phase. This is read-only and stored in images.
  • Temporary application data. Fetched or produced by the container, e.g. data input by a user of a web server. Stored in memory or temporary files. Read and write, but only temporary, so stored in containers.
  • Permanent application data. Fetched or produced in the running container, e.g. user account data. These are stored in files or a database which we want to persist even if the container stops or is removed. Read and write permanent data, so store in volumes.

The container has its own file system, which is lost when the container is removed (not when the container is stopped). This means the temporary data is lost when the container is removed.

This is a problem as we need to rebuild the image and start a new container whenever we make a code change.

Volumes solve this problem.

3.2. Volumes

Volumes are folders on the host machines which are mounted (made available) into containers. A container can read and write to a volume.

The COPY command in a Dockerfile is just a one-time snapshot. A volume syncs the data between the two; if either the container or the host machine updates the data then it is reflected in the other.

This allows us to persist data if a container shuts down or is removed.

3.3. How to Add a Volume

The VOLUME instruction can be used in a Dockerfile. It takes an array of the path(s) inside the container to map to volumes.

VOLUME [ “/app/feedback” ]

Note that we do not specify a path on the host machine to map to. Docker abstracts this away from us; we never know the location on our host machine, we only interact with it via docker volumes. More on this later.

3.4. Types of External Data Storage

There are two types: volumes (managed by Docker) and bind mounts (managed by you).

3.4.1. Volumes

Volumes are managed by Docker. They can be anonymous or named.

In either case, Docker sets up a folder on your host machine in an unknown location which is accessed via the ‘docker volume’ command. This maps to the internal directory in the container.

Anonymous volumes are removed when the container is removed. This might not be very helpful for our use cases if we want to persist data.

This is where named volumes should be used. Named volumes do survive the container being removed. They are useful for data which should be persisted but which you don’t need to edit directly, as it can only be accessed via docker volumes.

We can’t create named volumes inside the docker image, it needs to be create when we run the container using the -v flag.

docker run -v <volume_name>:<path/in/container>

We can delete volumes with:

docker volume rm <volume_name>

Or

docker volume prune

3.4.2 Bind Mounts

Bind mounts are managed by you, by defining a path on your host machine. They are good for persistent editable data.

We set the bind mount up when we run a container, not when we build an image. Wrap the path in quotes in case it path has special characters or spaces.

docker run -v<absolute/path/on/host/machine>:<path/in/container>

You can also use shortcuts to current working directory:

-v $(pwd):/app

You need to ensure docker has access to the folder you’re sharing. In the docker preferences UI, check Resources->File Sharing contains the folder (or parent folder) that you want to mount.

3.5. Combining and Merging Different Volumes

We can have: - Data copied in the image build step (with the COPY command) - Volumes - named and anonymous - Bind mounts

If two or more of these refer to the same path in the container, the one with the longer file path (more specific) wins.

This can be helpful when, for example, we want a node project with “hot reload”.

  1. In the image build step, copy the repo and npm install the dependencies
  2. When running the container, bind mount the repo with the host machine. BUT this would overwrite the node modules folder in the repo, undoing the npm install step and meaning we have no third party libraries.
  3. Also mount an (anonymous) volume specifically for the node modules folder. Since this is a more specific file path, it takes priority and will not be overwritten by the bind mount.

This is a fairly common pattern in a variety of use cases. Use anonymous volumes with a specific path to prevent that folder being overwritten.

3.6. Restarting Web Servers

If you make a change to server code and want to restart the web server, it might not be trivial to do so.

You can stop and restart the container. This at least saves the trouble of rebuilding.

Or in node, you can add the nodemon package as a dependency, which watches for changes to the server code and restarts the server when required.

There may be other similar approaches for other languages and use cases besides web servers, but this type of thing is something to be aware of.

3.7. Read-Only Volumes

By default, volumes are read-write. We may want to make a volume read-only, which we can do with an additional colon after the volume name. This applies to the folder and all sub-folders.

-v “bind/mount/path:path/in/container:ro”

If we want a specific subfolder to be writable, we can do a similar trick like we did with the anonymous volume, where we specify a second more specific volume that allows read/write access.

3.8. Managing Docker Volumes

Volumes can be listed with:

docker volume ls

Bind mounts do not appear here, as they are self-managed.

You can also create a volume directly with

docker volume create <volume_name>

See information on a volume, including the mount point where the data actually is stored on the host machine, with the inspect command:

docker volume inspect 

Remove volumes with:

docker volume rm <volume_name>
docker volume prune

3.9. dockerignore file

We can use a .dockerignore file to specify folders and files to ignore.

This is similar in principle to a .gitignore file.

4. Arguments and Environment Variables

Docker supports build-time arguments (ARG) and run-time environment variables (ENV).

You typically should put ARG and ENV in one of the later image layers, as they are likely to change which requires rebuilding all subsequent layers.

4.1. Environment Variables

Environment variables are variables that can be set in the Dockerfile or on docker run. They are available inside the Dockerfile and in application code.

Insider the Dockerfile:

ENV <VARIABLE_NAME> <default value>
ENV PORT 80
EXPOSE $PORT

When we use environment variables in the Dockerfile, preface them with a $ to indicate that they are variable names not literal names.

We can use them in code too. The syntax depends on the language. In node, for example, it is process.env.PORT.

Then when we run the container with an environment variable passed

docker run —env PORT=8000

If we have multiple environment variables, pass each with a --env or -e flag followed by the key=value pair.

We can create an environment file with all environment variables and values in. This is helpful if we have lots of variables to keep track of. Say we have a file called .env, we can pass this to docker run with:

docker run --env-file ./.env

4.2. Arguments

Arguments are variables that can be passed to the docker build command to be used inside the Dockerfile. It is not accessible to CMD or any application code.

In the Dockerfile, declare arguments with ARG, then use them prefixed with a $:

ARG DEFAULT_PORT=80

ENV PORT $DEFAULT_PORT

To overwrite the argument value in the build command:

docker build —build-arg DEFAULT_PORT=69

5. Networking

Docker networking cheat sheet.

5.1. Types of Communication

We may want to communicate from our container to:

  • Another container
  • An external service (such as a website)
  • Our host machine

It is best practice to have a single responsibility per container. So if we have a web server and a database, those should be two separate containers that can communicate with each other, rather than one monolith container.

If we were to just lump everything in one container, that’s essentially the same as lumping everything into a VM, so we’ve lost the modularity benefit of Docker. We’d need to rebuild the image and run a new container whenever anything changes anywhere.

5.1.1. Container to Web Communication

Out of the box, this works from containers. You don’t need any special set up to be able to send requests to a web page or web API from the container.

5.1.2. Container to Host Machine Communication

Replace localhost with host.docker.internal.

This tells Docker that we are actually referring to the host machine, and Docker will replace this with the IP address of the host machine as seen from inside the container.

So we just need to change our code so that any endpoints that are pointing at localhost get “translated” by Docker.

5.1.3. Container to Container Communication

If we inspect the container we want to connect to, we can see its IP address under NetworkSettings.

We can replace localhost with the IP address of the container in our code with connects to it.

But this is cumbersome and impractical. We need to inspect the IP address of container B, change our code for application A so that it references this IP address, then rebuild image A before we can finally run container A.

Container networks simplify this.

5.2. Docker Container Networks

We can run multiple containers in the same network with:

docker run —network <network_name> 

All containers within a network can communicate with each other and all IP addresses are automatically resolved.

We need to explicitly create the network before we can use it. Docker does not automatically do this. To create a network called my-network:

docker network create my-network

We can list networks with:

docker network ls

To communicate with another container in the network, just replace localhost in the URL with the target container name. Docker will automatically translate the IP address.

We don’t need to publish a port (with the -p flag) if the only connections are between containers in the network. We only need to publish a port when we want to communicate with a container from outside the network, such as from our host machine.

5.3. Network Drivers

Docker supports different network drivers, which we can set with the --driver argument when creating the network.

The driver options are:

  • bridge - This lets containers find each other by name when in the same network. This is the default driver and most common use case.
  • host - For standalone containers, isolation between container and host system is removed (i.e. they share localhost as a network).
  • overlay - Multiple Docker daemons (i.e. Docker running on different machines) are able to connect with each other. Only works in “Swarm” mode which is a dated / almost deprecated way of connecting multiple containers.
  • macvlan: You can set a custom MAC address to a container - this address can then be used for communication with that container.
  • none - All networking is disabled.

You can also install third party plugins to add other behaviours and functionality.

5.4. How Docker Resolves IP Addresses

We’ve seen that, depending on the type of networking, we can refer to any of the following in our source code, and Docker will translate this to the correct IP address:

localhost
host.docker.internal
container-name

Docker owns the entire environment in which it runs. When it sees an outgoing request with one of the “translatable” names, it will replace that placeholder in the URL with the relevant IP address.

The source code isn’t changed. The translation only happens for outbound requests from inside the container / network.

6. Docker Compose

Docker Compose cheat sheet.

6.1. What Problem Does Docker Compose Solve?

We have seen that we can create multiple containers and add them to the same network to allow them to communicate.

This is a bit inconvenient though, because we need to build/pull multiple images and run all of the containers with very specific commands in order to publish the correct ports, add the required volumes, put the containers in the correct network, etc.

Docker Compose is a convenience tool to automate the setup of multiple containers. This way, we can start and stop a multi-container application with a single command.

Docker Compose allows us to replace multiple docker build and docker run commands with one configuration file containing the orchestration steps. This lets us automate the set up of multiple images and containers.

We still need Dockerfile files for each of the custom images.

Docker Compose is not suited for managing multiple containers on different host machines.

6.2. Writing a Docker Compose File

We specify all of the details relevant to our containers such as:

  • Published ports
  • Environment variables
  • Volumes
  • Networks

The configuration file is named docker-compose.yaml.

By default in Docker Compose:

  • When a container stops it is removed, so we don’t need to add the --rm flag since it is implicitly added.
  • All containers are added to the same network automatically.
  • If we reference the same named volumes in different services, that volume will be shared so different services can read/write the same volume.

We specify each of our services (essentially containers). If we want to build a custom image and then use it, we use the build key. Any args we pass to the build command can be specified under it. If one container requires another to be running first, we can specify the depends_on key.

There are Docker Compose equivalent of flags like the interactive flag -i -> stdin_open: true, and TTY -t- ->tty: true`.

If we have a bind mount, we can specify it under volumes with the relative path from the docker-compose.yml file. This is easier than outside of docker compose where we needed the absolute path.

version: “3.8”
services: 

  backend:
    # Build a custom image then use it
    build: 
      context: ./backend
      # Specify any args passed to the Dockerfile
      args:
        arg-name-1: 69
    ports: 
      - “3000:3000”
    volumes:
      # Relative path to bind mount
      - ./backend:/app
    depends_on:
      # The backend service required `db` to be running first
      - db

  frontend:
  # The equivalent of interactive mode (-i and -t flags)
    stdin_open: true
    tty: true

  db:
    image: “mongo”  # can be an image from DockerHub, private repo or local machine
    volumes:
      - data:data/db  # This is the same URL as it is outside of docker compose 
    environment:
      ENV_VAR_KEY: “env-var-value”
    env_file:  # Instead of environment 
    - ./relative/path/from/compose/file
    networks:
    # By default, every container in the compose file is added to the same network automatically, so we often don’t need to specify a network here unless we want something non-standard 
    - any-custom-networks
  <any other user-selected names for containers>

# Any named volumes used above need to be specified here as a top-level key 
volumes:
  # Just the name of the volume as a key, with no value. 
  # This syntax lets docker know this is a named volume.
  data:

6.3. Docker Compose Up and Down

From the same directory as the docker-compose.yml file, run this in a terminal to start the containers:

docker-compose up

As before, we can run in detached mode with the -d flag.

By default, Docker will use the images locally if it finds them there. We can force Docker to rebuild all images instead with the --build flag. We can also just build images without starting containers with

docker-compose build

We can stop all services with:

docker-compose down

This does not delete volumes though, unless you add the -v flag.

6.4. Container Names

We specify the service names in our docker-compose.yml file.

The container names will be:

<project-directory-name>_<service-name>_1

You can explicitly set the container name with the container_name key under the corresponding service in the docker-compose.yml file.

6.5. Single Container Applications

The main benefit of Docker Compose is for multi-container applications.

But you can, and might prefer to, use it for a single container application. It makes it easier to keep track of long commands and different steps required to build an image and run a container with specific flags.

7. Utility Containers

The most common use case for Docker is application containers. They contain an environment and your application code.

“Utility containers” only contain an environment.

7.1. Why Would We Need Utility Containers?

In some cases, we need some software installed to create a project before we can then dockerize it.

As an example, if we have a node project we typically run ‘npm init’ to populate the package.json file with the package details and dependencies which allows us to run the application. And then we can dockerize that application.

But that would require us to have node and npm installed on our host machine, defeating the purpose of running it in a container.

7.2. Running Commands in Containers

We can run containers in interactive mode with:

docker run -it node

We can execute any command, not just limited to the CMD command defined in the image, using ‘exec’. Say, for example, we want to run npm init inside a running container:

docker exec -it <container_name> npm init

This allows us to run additional commands without interrupting the main process. A use case of this is if we have a web server running and want to check its logs.

We can also pass a command to docker run to override the default CMD command:

docker run -it node npm init

7.3. Building a Utility Container

We can just have a minimal Dockerfile without any CMD:

FROM node:14-alpine
WORKDIR /app

We can build our utility image, let’s call it node-util, from the Dockerfile as usual:

docker build -t node-util .

Then we can run it in interactive mode with a bind mount to our project directory on the host machine, and execute npm init to create the package.json on our host machine

docker run -it -v /absolute/path/to/project:/app node-util npm init

So it’s as if we’ve got node on our machine, but it’s actually running in Docker.

7.4. Using ENTRYPOINT

We can use entry point if we want to restrict the set of commands that can be run in interactive mode.

It looks similar to CMD, but the difference is any commands that we specify in docker run or docker exec will be appended to the ENTRYPOINT, rather than replacing the CMD.

For example, if we only want to allow npm commands, in our Dockerfile we can add:

ENTRYPOINT [ “npm” ]

Then when we rebuild the image and run:

docker run -it -v /absolute/path/to/project:/app node-util init

This will execute npm init - the init command we passed is appended to the npm entry point.

7.5. Docker Compose with Utility Containers

We’ll often want to use docker-compose for utility containers because it encapsulates all of the logic around bind mounts, volumes and other optional flags to the build and run commands that we would typically use for utility containers.

The same logic to run arbitrary commands applies to containers created using docker-compose, just with some slightly different syntax.

If we are running a service called npm-service with an npm entry point:

docker-compose run npm-service init

Note that a container started in this way will not automatically remove itself when executing docker-compose down unlike the usual behaviour. We can pass the usual --rm flag if we want that behaviour.

To execute commands in already running containers:

docker-compose exec npm-service init

7.6. A Note on File Permissions

On Linux, Docker runs as the root user, so any files created by the utility container will have the root user as its owner. This might cause some issues in certain cases, so be aware of this if seeing any funky behaviour on Linux.

On Mac and Windows, Docker runs inside a virtual machine, so this issue doesn’t occur.

8. Deployment

Containers are ideal for deploying to other machines since the main selling point is self-contained, reproducible environments.

8.1. Production Considerations

Some potential differences between production and dev environments:

  • Don’t use bind mounts in production.
    • They are convenient for dev environments where we want instant updates without restarting the container. In production, we want the source code in the image, not on the remote machine. So we use COPY rather than bind mounts. We can do this easily by always COPYing the source code in the Dockerfile so it is built into the image, then in dev we add the bind mount for the source code when we run the container.
  • Containerised apps might need different steps in dev vs prod.
    • E.g. React apps that need a build step rather than a preview in dev.
  • Multi-container projects might need to be split across multiple remote machines.
  • Trade-off between control and responsibility
    • Managed solutions vs self-managed.

8.2. A (Minimal) Self-Managed Deployment

Using AWS as our cloud provider, these are the main steps to deploy:

  1. Create an EC2 instance, VPC and security group.
  2. Configure the security group to expose any required ports.
  3. Connect to the instance via ssh.
  4. Install Docker.
  5. pull the image and run the container.

8.2.1 Creating the EC2 Instance

Most of the default settings in the EC2 wizard can be left as is. It’s easiest to use the Amazon Linux image.

Ensure a VPC is selected, and that you download the key-pair to allow you to ssh into the instance later.

8.2.2. Connecting to the Instance

Select the instance from the EC2 dashboard and select the connection method (SSH client). This wizard will show the steps required to connect.

From your bash terminal on your local machine in the same directory as your key file, run:

chmod 400 key-file-name.pem
ssh -i “key-file-name.pem” ec2.url.goes.here

8.2.3. Configure the Security Group

By default in EC2, the only way to connect to the instance is via ssh. We can change this to allow accessing it via the IP address.

In the EC2 dashboard, find the instance and go to Security Groups. Change the Inbound Rules to allow http connections from source=anywhere.

8.2.4. Install Docker

We can use amazon-linux-extras to install packages if we are using the Amazon Linux image for our EC2 instance. Otherwise, just use the standard package manager for your OS.

In the ssh terminal:

sudo yum update -y
sudo amazon-linux-extras install docker 
sudo service docker start

8.2.5. Pull the Image and Run the Container

Now we can run Docker commands, so we can pull the image we want and run the container. Assuming our image exists on Docker Hub or some private repository, we can pull the image and run a container as usual:

docker pull <image_name>
docker run <image_name>

We can find the IP address of our instance in the EC2 dashboard, then visit this in a browser to use our application.

8.2.6. The (Dis)advantages of Self-Managed

The approach in this minimal example was to set up everything ourselves, i.e. self-managed.

This means we own, and are responsible for, every element: the security, managing networks and firewalls, ensuring packages are up to date, selecting the right size of instance, etc.

This can be an advantage if we have the skills in the team to cover all of these disparate areas. If not, we may want to choose a managed approach.

8.3. A (Minimal) Managed Deployment

There are many different managed Docker solutions. Elastic Container Service (ECS) is Amazon’s offering.

Note that with any managed provider, you won’t be using Docker commands directly and will have to work within that provider’s specific “rules”.

ECS has 4 main concepts: clusters, services, tasks, containers.

8.3.1. Container

In the ECS dashboard, we use the wizard to define details like container name, image name, port mappings, etc. This essentially determines how ECS will run the docker run command later.

There are also options for logging (via CloudWatch).

8.3.2. Task

The blueprint for your application - how AWS should launch your container. Not how it should execute docker run, but how the server that Docker runs on should be configured.

You can run the container on EC2 or Fargate. Fargate is serverless, and will start and stop an instance when a request is received. Contrast this with an EC2 instance which is always running whether it is handling requests or not.

8.3.3. Service

How the task should be executed. We can define security groups and load balancers here.

8.3.4. Cluster

Overall network in which our services run. If we have multiple containers in one application, we can run them on the same cluster.

8.3.5. Updating Managed Images

We can build a new image and push it to our remote repository. If we have an ECS instance running, we can force it to pick up the new version of the image in the ECS dashboard.

ECS -> Clusters -> Default 
-> Tasks -> Task Definition (of the task you want to update) 
-> Create New Revision -> Actions: Update Service

This creates a new task but will pull the latest image.

8.4. Multi-Container Deployments

Docker Compose was helpful for multi-container applications running on our local machine, but it is less useful for deployments.

  • We may want to specify how much CPU each service has.
  • We may want each service running on different machines
  • If using a managed service, that provider may require details beyond what is in the docker-compose file

If you run all containers in the same task in ECS, you can communicate between containers using localhost because they will all be run on the same machine.

8.4.1. Load Balancer

We can add a load balancer when launching the service. This should be an ‘internet-facing’ Application Load Balancer (ALB). You may need to create one manually if you did not specify the load balancer in the initial set up wizard. It should use the same VPC as your service. Configure the security groups and configure routing with Target type=IP. Change the health check path to a valid URL which should return a 200 status code when pinged.

8.4.2. Stable IP Address

We can use the DNS name to send requests to, rather than the public IP address (which can change as services are stopped and started).

8.5. EFS Volumes

If we restart a task, for example when updating a service, the volumes attached to it will be lost.

Elastic File System (EFS) volumes are persistent storage.

We can add these in the Volumes section of the ECS wizard. It should be in the same VPC as used for ECS. Under Network Access, we need to add a new security group with inbound rule Type=NFS and source as the security group used for the service.

This will allow the services to communicate with EFS.

8.6. Database Containers

You can manage your own database containers, but:

  • Scaling and managing availability can be challenging. Ensuring consistency between all running containers
  • Performance can be bad during traffic spikes
  • Backups and security can be challenging

For these reasons, you may prefer a managed database solution like AWS RDS, MongoDB Atlas, etc.

8.6.1. Managed Database

When we use a cloud-based managed database, we may want to have different dev and prod environments.

One option would be to have two different databases, one for dev and one for prod. We can set an environment variable like DB_NAME so that we can overwrite the connection URL with the dev/prod version when we run the containers that connect to it.

An alternative would be to use a container in dev and the cloud-version in prod. We just need to be careful that we use the image version of, say, MongoDB, that corresponds to the cloud version. Otherwise the dev and prod versions might not be compatible.

8.6.2. Build-Only Containers

In some applications, for example React apps, we use something like npm run preview in a dev environment which allows us to hit reload.

But in production we build first, which compiles an optimised file.

npm run build
npm run start

So we may want different Dockerfiles for dev and prod. We can name the files to distinguish, e.g. Dockerfile.prod, then when we run docker build we pass the -f flag to specify the file name (by default it looks for Dockerfile with no extension).

Multi-stage builds allow us to use one Dockerfile to run multiple build steps or “stages”. Stages can copy the contents from each other. You can either build the complete image or select individual stages.

With multi-stage builds, we need to use RUN instead of CMD.

We may want a different base image for different stages of the build. We can use another FROM statement to create a new stage. Each FROM statement in a file effectively delineates each stage.

We can COPY between stages by passing the --from flag with the stage name. E.g.

FROM node as build_stage

…

FROM nginx as server_stage

COPY --from=build /path/in/build/stage /path/in/server/stage

We can set the build target to only build specific stages by using the --target flag, referring to the alias names we gave the stages in the Dockerfile.

docker build --target build_stage .

References

  • “Docker & Kubernetes: The Practical Guide” Udemy course
Back to top