1. Docker Overview
1.1. What is Docker?
Docker is a tool for creating and managing containers.
A container is a standardised unit of software - a package of code and the dependencies required to run the code. The same container always yields the exact same application and execution behaviour.
Support for containers is built into all modern operating systems. Docker simplifies the creation and management of containers. Containers are a general idea, Docker is just the de facto standard tool for dealing with containers.
1.2. Why Use Containers?
Why would we want independent, standardised, standalone applications packages?
- We may have different dev vs prod environments. We want to build and test in the same way.
- Multiple developers keeping their development environments in sync with each other.
- Clashing tools and versions between projects.
1.3. Virtual Machines vs Docker Containers
With a virtual machine (VM), we have:
- Our operating system
- A VM with virtual OS
- Libraries and dependencies
- Our own app
This does allow separated environments which environment-specific configurations which can be shared and reproduced. However, it involves a lot of duplication, can be slow with long boot times and reproducing on another server is possible but can be tricky.
There is a big overhead with VMs, because we recreate everything including the (virtual) OS every time we want a new VM. This wastes a lot of storage and tends to be slow.
A container is a lightweight solution to the same problem. The Docker Engine interfaces with the OS, and the container sits on top of the Docker engine. This means each container only needs to contain its own code + data, it doesn’t need to reproduce the OS or any additional utlities.
- OS
- OS built-in container support
- Docker engine
- Container - App, libraries
The benefits of a container compared to a VM:
- Low impact on OS, minimal storage usage
- Sharing and rebuilding is easy
- Encapsulate apps and environments rather than whole machines
1.4. Docker Setup
The specific steps depend on the OS, so check the docs.
For Linux, install Docker Engine. For Mac and Windows, install Docker Desktop (or Docker Toolbox if requirements not met).
Docker playground is helpful to try things out in a browser with no installation required.
Docker Hub (my second favourite *hub.com
on the internet*) is a centralised repository that lets us share containers.
Docker compose and Kubernetes help manage multi-container apps.
* Number one is github, you dirty dog.
1.5. Overview of a Container
The Dockerfile
file (no extension) contains the list of commands to create our image.
To build the container run this in a terminal in the same directory as the Dockerfile:
docker build .
To run the container:
docker run <image_id>
Optionally, if we want to expose port 3000 inside the container to the outside world, run:
docker run -p 3000:3000 <image_id>
Some other useful basic commands are listing running containers:
docker ps -a
And stopping a container:
docker stop <container_name>
2. Images and Containers
Docker image and container cheat sheet.
2.1. Image vs Container
A container is a running “unit of software”. An image is the blueprint for the container.
So we can define an image then create containers running in multiple places. The image contains the code and environment, the container is the running app.
2.2. Using Prebuilt Images
We can either use a prebuilt image or one that we’ve created ourselves.
Docker Hub is useful for finding and sharing prebuilt images.
To run a prebuilt image from docker hub, node
in this case, run this in a terminal:
docker run node
This will automatically pull
the prebuilt node image if it hasn’t already been pulled, then run it.
By default, the container is isolated, but we can run it in interactive mode if we want to use the REPL.
docker run -it node
We can use exec
to interact with the shell terminal of an already running container:
docker exec -it <container_name> sh
2.3. Building Our Own Image
We can build our own images; typically these will layer on top of other prebuilt images. The Dockerfile
file configures the image.
We typically start with FROM
to build on some base image, either on your local system or on DockerHub.
FROM node
We then want to tell Docker which files on the local machine should go in the image. Often we want to COPY
the contents of the current directory to the container file system, for example in a directory called app.
COPY . /app
The WORKDIR
command tells docker that any subsequent commands we RUN
should run from this directory. So instead of the above copy with absolute filepath, we could do:
WORKDIR /app
COPY . .
We then want to RUN
a command inside the image, for example, to install dependencies.
RUN npm install
The command instruction, CMD
, specifies code to execute when the container is started. Contrast this with image** is created. An example of the distinction is starting a server, which we do in the container. The syntax is a bit odd, instead of CMD node server.js
we need to pass the command as an array of strings.
CMD [“node”, “server.js”]
If you don’t specify a CMD
, the CMD
of the base image will be executed. It should be the last instruction in a Dockerfile.
The web server listens on a particular port, say port 80. The container is isolated, so we need to EXPOSE
the port.
EXPOSE 80
This is best practice to explicitly declare any used ports, but it is optional and does not actually make the port available. For this, we need to specify the port with the -p
flag when creating the container with docker run
. See the next section.
2.4. Running a Container Based On Our Image
We have a complete Dockerfile, next we need to build
the image. In a terminal in the same directory as the Dockerfile
, run:
docker build .
This outputs an image ID. We can then run
this image. This blocks the terminal.
docker run <image_id>
We need the “publish” -p
flag to make the port in the container accessible. Say we want to access the application through a local port 3000, and we are accessing the internal port 80 inside the container.
docker run -p 3000:80 <image_id>
We can then see our app if we visit localhost:3000
in a browser.
2.5. Images Are Read-Only
If we are changing the code inside the image, we need to build the image again and then run a new container from the new image.
The contents of an image are set at the build step and cannot be altered afterwards.
2.6. Understanding Image Layers
Images are layer-based. This means that if there is a change to the image and the image is rebuilt, it will cache previous layers and if the layer is unchanged, it will use the cached layer. It will only recreate layers where there was a change (and all subsequent layers).
This means we can optimise our Dockerfile by copying code which changes frequently near the end, and code which is relatively static, like dependencies, should go earlier. So when we have a code change, we don’t need to reinstall everything.
2.7. Managing Images and Containers
The following commands are useful when managing images and containers.
Images:
- Tag:
-t
- List:
docker images
- Analyse:
docker image inspect <image_id>
- Remove:
docker rmi <image_id>
;docker prune
Containers:
- Name:
-—name
- List:
docker ps
- Remove:
docker rm
2.8. Stopping and Restarting Containers
Start a new container with docker run
.
If nothing in our image has changed, we may just want to restart a previously run container:
docker start <container_name>
2.9. Attached and Detached Containers
An attached container is running and blocks the terminal. For example, when we use docker run
. Attached simply means we are “listening” to the output of the container, which can be helpful if we want to look at the logs.
A detached container is running but runs in the background without blocking the terminal. For example, when we run docker start
.
We can configure whether to run in attached or detached mode. The -d
flag runs in detached mode:
docker run -d <image_id>
The -a
flag runs in attached mode:
docker start -a <container_name>
We can attach ourselves to a detached container using:
docker attach <container_name>
To see the log history, we can use
docker logs <container_name>
We can run this in “follow” mode with the -f
flag to print the logs and keep listening (i.e. attach to the container)
docker logs -f <container_name>
2.10 Interactive Mode
If we have an application that requires user input, or we just want to see what’s going on inside the container, we can run in interactive mode.
We do this with the interactive flag -i
. This keeps STDIN open even if not attached. This is analogous to how the attach flag -a
keeps STDOUT open.
We usually combine this with the TTY flag -t
. This allocates a pseudo-TTY, which exposes a terminal we can use.
docker run -it <image_id>
2.11. Deleting Containers
We can list all containers with:
docker ps
We can remove containers with:
docker rm <container_name>
We can pass multiple containers to remove separated by a space.
We can’t remove a running container, so we need to stop it first:
docker stop <container_name>
We can remove ALL stopped containers with
docker container prune
2.12 Deleting Images
Analogously to containers, we can list images with:
docker images
We remove images with:
docker rmi <image_id>
We can pass multiple images to remove separated by a space.
We can’t remove images which are being used by containers, even if that container is stopped. The container needs to be removed first before the image can be removed.
We can remove all unused images with:
docker image prune
This will remove all untagged images. To also remove tagged images we need the -a
flag
docker image prune -a
We can pass the --rm
flag to the docker run command to automatically remove the container once stopped.
2.12 Inspecting Images
We can understand more about an image by inspecting it:
docker image inspect <image_id>
This tells us:
- ID
- Creation date
- Config such as ports, entry point, interactive, etc
- Docker version
- OS
- Layers
2.13 Copying Files Into and From a Running Container
We can copy files in and out of a running container with the cp
command.
To copy a file into a container:
docker cp <source file or folder> <container_name>:/<path in container>
To copy out of a container, just swap the order of the arguments:
docker cp <container_name>:/<path in container> <source file or folder>
This allows us to add config or data to a running container, or copy results or logs out of the container.
2.16. Summary
- Images (blueprints) vs Containers (the running instance of the app).
- Images can be pulled or built.
- The Dockerfile configures the image when building our own images.
- Images are built in layers.
- We can run containers from an image, and configure some helpful flags when running.
- We can manage containers: list, remove, stop, start
- We can manage images: list, remove, push, pull
3. Managing Data in Containers
Docker volumes cheat sheet.
3.1. Different Categories of Data
- Application. Code and environment. Added to the image in the build phase. This is read-only and stored in images.
- Temporary application data. Fetched or produced by the container, e.g. data input by a user of a web server. Stored in memory or temporary files. Read and write, but only temporary, so stored in containers.
- Permanent application data. Fetched or produced in the running container, e.g. user account data. These are stored in files or a database which we want to persist even if the container stops or is removed. Read and write permanent data, so store in volumes.
The container has its own file system, which is lost when the container is removed (not when the container is stopped). This means the temporary data is lost when the container is removed.
This is a problem as we need to rebuild the image and start a new container whenever we make a code change.
Volumes solve this problem.
3.2. Volumes
Volumes are folders on the host machines which are mounted (made available) into containers. A container can read and write to a volume.
The COPY
command in a Dockerfile is just a one-time snapshot. A volume syncs the data between the two; if either the container or the host machine updates the data then it is reflected in the other.
This allows us to persist data if a container shuts down or is removed.
3.3. How to Add a Volume
The VOLUME
instruction can be used in a Dockerfile
. It takes an array of the path(s) inside the container to map to volumes.
VOLUME [ “/app/feedback” ]
Note that we do not specify a path on the host machine to map to. Docker abstracts this away from us; we never know the location on our host machine, we only interact with it via docker volumes
. More on this later.
3.4. Types of External Data Storage
There are two types: volumes (managed by Docker) and bind mounts (managed by you).
3.4.1. Volumes
Volumes are managed by Docker. They can be anonymous or named.
In either case, Docker sets up a folder on your host machine in an unknown location which is accessed via the ‘docker volume’ command. This maps to the internal directory in the container.
Anonymous volumes are removed when the container is removed. This might not be very helpful for our use cases if we want to persist data.
This is where named volumes should be used. Named volumes do survive the container being removed. They are useful for data which should be persisted but which you don’t need to edit directly, as it can only be accessed via docker volumes
.
We can’t create named volumes inside the docker image, it needs to be create when we run the container using the -v
flag.
docker run -v <volume_name>:<path/in/container>
We can delete volumes with:
docker volume rm <volume_name>
Or
docker volume prune
3.4.2 Bind Mounts
Bind mounts are managed by you, by defining a path on your host machine. They are good for persistent editable data.
We set the bind mount up when we run a container, not when we build an image. Wrap the path in quotes in case it path has special characters or spaces.
docker run -v “<absolute/path/on/host/machine>:<path/in/container>”
You can also use shortcuts to current working directory:
-v $(pwd):/app
You need to ensure docker has access to the folder you’re sharing. In the docker preferences UI, check Resources->File Sharing
contains the folder (or parent folder) that you want to mount.
3.5. Combining and Merging Different Volumes
We can have: - Data copied in the image build step (with the COPY
command) - Volumes - named and anonymous - Bind mounts
If two or more of these refer to the same path in the container, the one with the longer file path (more specific) wins.
This can be helpful when, for example, we want a node project with “hot reload”.
- In the image build step, copy the repo and npm install the dependencies
- When running the container, bind mount the repo with the host machine. BUT this would overwrite the node modules folder in the repo, undoing the npm install step and meaning we have no third party libraries.
- Also mount an (anonymous) volume specifically for the node modules folder. Since this is a more specific file path, it takes priority and will not be overwritten by the bind mount.
This is a fairly common pattern in a variety of use cases. Use anonymous volumes with a specific path to prevent that folder being overwritten.
3.6. Restarting Web Servers
If you make a change to server code and want to restart the web server, it might not be trivial to do so.
You can stop and restart the container. This at least saves the trouble of rebuilding.
Or in node, you can add the nodemon
package as a dependency, which watches for changes to the server code and restarts the server when required.
There may be other similar approaches for other languages and use cases besides web servers, but this type of thing is something to be aware of.
3.7. Read-Only Volumes
By default, volumes are read-write. We may want to make a volume read-only, which we can do with an additional colon after the volume name. This applies to the folder and all sub-folders.
-v “bind/mount/path:path/in/container:ro”
If we want a specific subfolder to be writable, we can do a similar trick like we did with the anonymous volume, where we specify a second more specific volume that allows read/write access.
3.8. Managing Docker Volumes
Volumes can be listed with:
docker volume ls
Bind mounts do not appear here, as they are self-managed.
You can also create a volume directly with
docker volume create <volume_name>
See information on a volume, including the mount point where the data actually is stored on the host machine, with the inspect
command:
docker volume inspect
Remove volumes with:
docker volume rm <volume_name>
docker volume prune
3.9. dockerignore file
We can use a .dockerignore
file to specify folders and files to ignore.
This is similar in principle to a .gitignore
file.
4. Arguments and Environment Variables
Docker supports build-time arguments (ARG
) and run-time environment variables (ENV
).
You typically should put ARG
and ENV
in one of the later image layers, as they are likely to change which requires rebuilding all subsequent layers.
4.1. Environment Variables
Environment variables are variables that can be set in the Dockerfile or on docker run. They are available inside the Dockerfile and in application code.
Insider the Dockerfile:
ENV <VARIABLE_NAME> <default value>
ENV PORT 80
EXPOSE $PORT
When we use environment variables in the Dockerfile, preface them with a $
to indicate that they are variable names not literal names.
We can use them in code too. The syntax depends on the language. In node, for example, it is process.env.PORT
.
Then when we run the container with an environment variable passed
docker run —env PORT=8000
If we have multiple environment variables, pass each with a --env
or -e
flag followed by the key=value
pair.
We can create an environment file with all environment variables and values in. This is helpful if we have lots of variables to keep track of. Say we have a file called .env
, we can pass this to docker run with:
docker run --env-file ./.env
4.2. Arguments
Arguments are variables that can be passed to the docker build
command to be used inside the Dockerfile. It is not accessible to CMD or any application code.
In the Dockerfile, declare arguments with ARG
, then use them prefixed with a $
:
ARG DEFAULT_PORT=80
ENV PORT $DEFAULT_PORT
To overwrite the argument value in the build command:
docker build —build-arg DEFAULT_PORT=69
5. Networking
Docker networking cheat sheet.
5.1. Types of Communication
We may want to communicate from our container to:
- Another container
- An external service (such as a website)
- Our host machine
It is best practice to have a single responsibility per container. So if we have a web server and a database, those should be two separate containers that can communicate with each other, rather than one monolith container.
If we were to just lump everything in one container, that’s essentially the same as lumping everything into a VM, so we’ve lost the modularity benefit of Docker. We’d need to rebuild the image and run a new container whenever anything changes anywhere.
5.1.1. Container to Web Communication
Out of the box, this works from containers. You don’t need any special set up to be able to send requests to a web page or web API from the container.
5.1.2. Container to Host Machine Communication
Replace localhost
with host.docker.internal
.
This tells Docker that we are actually referring to the host machine, and Docker will replace this with the IP address of the host machine as seen from inside the container.
So we just need to change our code so that any endpoints that are pointing at localhost
get “translated” by Docker.
5.1.3. Container to Container Communication
If we inspect
the container we want to connect to, we can see its IP address under NetworkSettings
.
We can replace localhost
with the IP address of the container in our code with connects to it.
But this is cumbersome and impractical. We need to inspect the IP address of container B, change our code for application A so that it references this IP address, then rebuild image A before we can finally run container A.
Container networks simplify this.
5.2. Docker Container Networks
We can run multiple containers in the same network
with:
docker run —network <network_name>
All containers within a network can communicate with each other and all IP addresses are automatically resolved.
We need to explicitly create the network before we can use it. Docker does not automatically do this. To create a network called my-network:
docker network create my-network
We can list networks with:
docker network ls
To communicate with another container in the network, just replace localhost
in the URL with the target container name. Docker will automatically translate the IP address.
We don’t need to publish a port (with the -p
flag) if the only connections are between containers in the network. We only need to publish a port when we want to communicate with a container from outside the network, such as from our host machine.
5.3. Network Drivers
Docker supports different network drivers, which we can set with the --driver
argument when creating the network.
The driver options are:
bridge
- This lets containers find each other by name when in the same network. This is the default driver and most common use case.host
- For standalone containers, isolation between container and host system is removed (i.e. they share localhost as a network).overlay
- Multiple Docker daemons (i.e. Docker running on different machines) are able to connect with each other. Only works in “Swarm” mode which is a dated / almost deprecated way of connecting multiple containers.macvlan
: You can set a custom MAC address to a container - this address can then be used for communication with that container.none
- All networking is disabled.
You can also install third party plugins to add other behaviours and functionality.
5.4. How Docker Resolves IP Addresses
We’ve seen that, depending on the type of networking, we can refer to any of the following in our source code, and Docker will translate this to the correct IP address:
localhost
host.docker.internal
container-name
Docker owns the entire environment in which it runs. When it sees an outgoing request with one of the “translatable” names, it will replace that placeholder in the URL with the relevant IP address.
The source code isn’t changed. The translation only happens for outbound requests from inside the container / network.
6. Docker Compose
Docker Compose cheat sheet.
6.1. What Problem Does Docker Compose Solve?
We have seen that we can create multiple containers and add them to the same network to allow them to communicate.
This is a bit inconvenient though, because we need to build/pull multiple images and run all of the containers with very specific commands in order to publish the correct ports, add the required volumes, put the containers in the correct network, etc.
Docker Compose is a convenience tool to automate the setup of multiple containers. This way, we can start and stop a multi-container application with a single command.
Docker Compose allows us to replace multiple docker build and docker run commands with one configuration file containing the orchestration steps. This lets us automate the set up of multiple images and containers.
We still need Dockerfile files for each of the custom images.
Docker Compose is not suited for managing multiple containers on different host machines.
6.2. Writing a Docker Compose File
We specify all of the details relevant to our containers such as:
- Published ports
- Environment variables
- Volumes
- Networks
The configuration file is named docker-compose.yaml
.
By default in Docker Compose:
- When a container stops it is removed, so we don’t need to add the
--rm
flag since it is implicitly added. - All containers are added to the same network automatically.
- If we reference the same named volumes in different services, that volume will be shared so different services can read/write the same volume.
We specify each of our services
(essentially containers). If we want to build a custom image and then use it, we use the build
key. Any args
we pass to the build command can be specified under it. If one container requires another to be running first, we can specify the depends_on
key.
There are Docker Compose equivalent of flags like the interactive flag -i
-> stdin_open: true
, and TTY -t- ->
tty: true`.
If we have a bind mount, we can specify it under volumes
with the relative path from the docker-compose.yml
file. This is easier than outside of docker compose where we needed the absolute path.
version: “3.8”
services:
backend:# Build a custom image then use it
build:
context: ./backend# Specify any args passed to the Dockerfile
args:
arg-name-1: 69
ports:
- “3000:3000”
volumes:# Relative path to bind mount
- ./backend:/app
depends_on:# The backend service required `db` to be running first
- db
frontend:# The equivalent of interactive mode (-i and -t flags)
stdin_open: true
tty: true
db:# can be an image from DockerHub, private repo or local machine
image: “mongo”
volumes:# This is the same URL as it is outside of docker compose
- data:data/db
environment:
ENV_VAR_KEY: “env-var-value”# Instead of environment
env_file:
- ./relative/path/from/compose/file
networks:# By default, every container in the compose file is added to the same network automatically, so we often don’t need to specify a network here unless we want something non-standard
- any-custom-networks
<any other user-selected names for containers>
# Any named volumes used above need to be specified here as a top-level key
volumes:# Just the name of the volume as a key, with no value.
# This syntax lets docker know this is a named volume.
data:
6.3. Docker Compose Up and Down
From the same directory as the docker-compose.yml
file, run this in a terminal to start the containers:
docker-compose up
As before, we can run in detached mode with the -d
flag.
By default, Docker will use the images locally if it finds them there. We can force Docker to rebuild all images instead with the --build
flag. We can also just build images without starting containers with
docker-compose build
We can stop all services with:
docker-compose down
This does not delete volumes though, unless you add the -v
flag.
6.4. Container Names
We specify the service names in our docker-compose.yml
file.
The container names will be:
<project-directory-name>_<service-name>_1
You can explicitly set the container name with the container_name
key under the corresponding service in the docker-compose.yml
file.
6.5. Single Container Applications
The main benefit of Docker Compose is for multi-container applications.
But you can, and might prefer to, use it for a single container application. It makes it easier to keep track of long commands and different steps required to build an image and run a container with specific flags.
7. Utility Containers
The most common use case for Docker is application containers. They contain an environment and your application code.
“Utility containers” only contain an environment.
7.1. Why Would We Need Utility Containers?
In some cases, we need some software installed to create a project before we can then dockerize it.
As an example, if we have a node project we typically run ‘npm init’ to populate the package.json
file with the package details and dependencies which allows us to run the application. And then we can dockerize that application.
But that would require us to have node and npm installed on our host machine, defeating the purpose of running it in a container.
7.2. Running Commands in Containers
We can run containers in interactive mode with:
docker run -it node
We can execute any command, not just limited to the CMD
command defined in the image, using ‘exec’. Say, for example, we want to run npm init
inside a running container:
docker exec -it <container_name> npm init
This allows us to run additional commands without interrupting the main process. A use case of this is if we have a web server running and want to check its logs.
We can also pass a command to docker run
to override the default CMD command:
docker run -it node npm init
7.3. Building a Utility Container
We can just have a minimal Dockerfile without any CMD:
FROM node:14-alpine
WORKDIR /app
We can build our utility image, let’s call it node-util
, from the Dockerfile as usual:
docker build -t node-util .
Then we can run it in interactive mode with a bind mount to our project directory on the host machine, and execute npm init
to create the package.json
on our host machine
docker run -it -v /absolute/path/to/project:/app node-util npm init
So it’s as if we’ve got node on our machine, but it’s actually running in Docker.
7.4. Using ENTRYPOINT
We can use entry point if we want to restrict the set of commands that can be run in interactive mode.
It looks similar to CMD, but the difference is any commands that we specify in docker run or docker exec will be appended to the ENTRYPOINT, rather than replacing the CMD.
For example, if we only want to allow npm commands, in our Dockerfile we can add:
ENTRYPOINT [ “npm” ]
Then when we rebuild the image and run:
docker run -it -v /absolute/path/to/project:/app node-util init
This will execute npm init
- the init
command we passed is appended to the npm
entry point.
7.5. Docker Compose with Utility Containers
We’ll often want to use docker-compose
for utility containers because it encapsulates all of the logic around bind mounts, volumes and other optional flags to the build and run commands that we would typically use for utility containers.
The same logic to run arbitrary commands applies to containers created using docker-compose, just with some slightly different syntax.
If we are running a service called npm-service
with an npm entry point:
docker-compose run npm-service init
Note that a container started in this way will not automatically remove itself when executing docker-compose down
unlike the usual behaviour. We can pass the usual --rm
flag if we want that behaviour.
To execute commands in already running containers:
docker-compose exec npm-service init
7.6. A Note on File Permissions
On Linux, Docker runs as the root user, so any files created by the utility container will have the root user as its owner. This might cause some issues in certain cases, so be aware of this if seeing any funky behaviour on Linux.
On Mac and Windows, Docker runs inside a virtual machine, so this issue doesn’t occur.
8. Deployment
Containers are ideal for deploying to other machines since the main selling point is self-contained, reproducible environments.
8.1. Production Considerations
Some potential differences between production and dev environments:
- Don’t use bind mounts in production.
- They are convenient for dev environments where we want instant updates without restarting the container. In production, we want the source code in the image, not on the remote machine. So we use
COPY
rather than bind mounts. We can do this easily by alwaysCOPY
ing the source code in the Dockerfile so it is built into the image, then in dev we add the bind mount for the source code when we run the container.
- They are convenient for dev environments where we want instant updates without restarting the container. In production, we want the source code in the image, not on the remote machine. So we use
- Containerised apps might need different steps in dev vs prod.
- E.g. React apps that need a build step rather than a preview in dev.
- Multi-container projects might need to be split across multiple remote machines.
- Trade-off between control and responsibility
- Managed solutions vs self-managed.
8.2. A (Minimal) Self-Managed Deployment
Using AWS as our cloud provider, these are the main steps to deploy:
- Create an EC2 instance, VPC and security group.
- Configure the security group to expose any required ports.
- Connect to the instance via
ssh
. - Install Docker.
pull
the image andrun
the container.
8.2.1 Creating the EC2 Instance
Most of the default settings in the EC2 wizard can be left as is. It’s easiest to use the Amazon Linux image.
Ensure a VPC is selected, and that you download the key-pair to allow you to ssh into the instance later.
8.2.2. Connecting to the Instance
Select the instance from the EC2 dashboard and select the connection method (SSH client). This wizard will show the steps required to connect.
From your bash terminal on your local machine in the same directory as your key file, run:
chmod 400 key-file-name.pem
ssh -i “key-file-name.pem” ec2.url.goes.here
8.2.3. Configure the Security Group
By default in EC2, the only way to connect to the instance is via ssh. We can change this to allow accessing it via the IP address.
In the EC2 dashboard, find the instance and go to Security Groups
. Change the Inbound Rules
to allow http connections from source=anywhere
.
8.2.4. Install Docker
We can use amazon-linux-extras
to install packages if we are using the Amazon Linux image for our EC2 instance. Otherwise, just use the standard package manager for your OS.
In the ssh terminal:
sudo yum update -y
sudo amazon-linux-extras install docker
sudo service docker start
8.2.5. Pull the Image and Run the Container
Now we can run Docker commands, so we can pull the image we want and run the container. Assuming our image exists on Docker Hub or some private repository, we can pull the image and run a container as usual:
docker pull <image_name>
docker run <image_name>
We can find the IP address of our instance in the EC2 dashboard, then visit this in a browser to use our application.
8.2.6. The (Dis)advantages of Self-Managed
The approach in this minimal example was to set up everything ourselves, i.e. self-managed.
This means we own, and are responsible for, every element: the security, managing networks and firewalls, ensuring packages are up to date, selecting the right size of instance, etc.
This can be an advantage if we have the skills in the team to cover all of these disparate areas. If not, we may want to choose a managed approach.
8.3. A (Minimal) Managed Deployment
There are many different managed Docker solutions. Elastic Container Service (ECS) is Amazon’s offering.
Note that with any managed provider, you won’t be using Docker commands directly and will have to work within that provider’s specific “rules”.
ECS has 4 main concepts: clusters, services, tasks, containers.
8.3.1. Container
In the ECS dashboard, we use the wizard to define details like container name, image name, port mappings, etc. This essentially determines how ECS will run the docker run command later.
There are also options for logging (via CloudWatch).
8.3.2. Task
The blueprint for your application - how AWS should launch your container. Not how it should execute docker run, but how the server that Docker runs on should be configured.
You can run the container on EC2 or Fargate. Fargate is serverless, and will start and stop an instance when a request is received. Contrast this with an EC2 instance which is always running whether it is handling requests or not.
8.3.3. Service
How the task should be executed. We can define security groups and load balancers here.
8.3.4. Cluster
Overall network in which our services run. If we have multiple containers in one application, we can run them on the same cluster.
8.3.5. Updating Managed Images
We can build a new image and push it to our remote repository. If we have an ECS instance running, we can force it to pick up the new version of the image in the ECS dashboard.
ECS -> Clusters -> Default
-> Tasks -> Task Definition (of the task you want to update)
-> Create New Revision -> Actions: Update Service
This creates a new task but will pull the latest image.
8.4. Multi-Container Deployments
Docker Compose was helpful for multi-container applications running on our local machine, but it is less useful for deployments.
- We may want to specify how much CPU each service has.
- We may want each service running on different machines
- If using a managed service, that provider may require details beyond what is in the docker-compose file
If you run all containers in the same task in ECS, you can communicate between containers using localhost
because they will all be run on the same machine.
8.4.1. Load Balancer
We can add a load balancer when launching the service. This should be an ‘internet-facing’ Application Load Balancer (ALB). You may need to create one manually if you did not specify the load balancer in the initial set up wizard. It should use the same VPC as your service. Configure the security groups and configure routing with Target type=IP. Change the health check path to a valid URL which should return a 200 status code when pinged.
8.4.2. Stable IP Address
We can use the DNS name to send requests to, rather than the public IP address (which can change as services are stopped and started).
8.5. EFS Volumes
If we restart a task, for example when updating a service, the volumes attached to it will be lost.
Elastic File System (EFS) volumes are persistent storage.
We can add these in the Volumes section of the ECS wizard. It should be in the same VPC as used for ECS. Under Network Access, we need to add a new security group with inbound rule Type=NFS and source as the security group used for the service.
This will allow the services to communicate with EFS.
8.6. Database Containers
You can manage your own database containers, but:
- Scaling and managing availability can be challenging. Ensuring consistency between all running containers
- Performance can be bad during traffic spikes
- Backups and security can be challenging
For these reasons, you may prefer a managed database solution like AWS RDS, MongoDB Atlas, etc.
8.6.1. Managed Database
When we use a cloud-based managed database, we may want to have different dev and prod environments.
One option would be to have two different databases, one for dev and one for prod. We can set an environment variable like DB_NAME so that we can overwrite the connection URL with the dev/prod version when we run the containers that connect to it.
An alternative would be to use a container in dev and the cloud-version in prod. We just need to be careful that we use the image version of, say, MongoDB, that corresponds to the cloud version. Otherwise the dev and prod versions might not be compatible.
8.6.2. Build-Only Containers
In some applications, for example React apps, we use something like npm run preview in a dev environment which allows us to hit reload.
But in production we build first, which compiles an optimised file.
npm run build
npm run start
So we may want different Dockerfiles for dev and prod. We can name the files to distinguish, e.g. Dockerfile.prod
, then when we run docker build we pass the -f
flag to specify the file name (by default it looks for Dockerfile with no extension).
Multi-stage builds allow us to use one Dockerfile to run multiple build steps or “stages”. Stages can copy the contents from each other. You can either build the complete image or select individual stages.
With multi-stage builds, we need to use RUN
instead of CMD
.
We may want a different base image for different stages of the build. We can use another FROM
statement to create a new stage. Each FROM
statement in a file effectively delineates each stage.
We can COPY
between stages by passing the --from
flag with the stage name. E.g.
FROM node as build_stage
…
FROM nginx as server_stage
COPY --from=build /path/in/build/stage /path/in/server/stage
We can set the build target to only build specific stages by using the --target
flag, referring to the alias names we gave the stages in the Dockerfile.
docker build --target build_stage .
References
- “Docker & Kubernetes: The Practical Guide” Udemy course