Docker, the container software, requires images to specify how a container is created. Dockerfiles are text documents named simply ‘dockerfile’, which when docker is run (with the ‘docker run’ or ‘docker build’ command) from within the directory which contains the file, creates the image the container is based upon. Every Docker image is made up of layers, and these correspond to the sections of the Dockerfile. Because a Docker image is immutable, if you do want to change an image you need to modify the Dockerfile and create the image again.
This article is designed as a summary of the main components of a Dockerfile, to help a beginner read and understand what is happening within a Dockerfile. I’ve included some links to other articles I’ve found helpful (particularly in area’s I struggled to understand).
You can use the ‘docker build’ command to create an image. Docker will include all files within the folder (or GIT Repository) so best to start a new folder with the dockerfile in, or if you have to use an existing folder use a .dockerignore file to exclude certain files.
It is recommended that you follow Docker’s Best Practices for Dockerfiles.
‘#’ delineates comments within the Dockerfile, but can also permit “Parser Directives” which inform docker how to deal with certain syntax or escape functions. For example,
# escape = ‘
sets the escape character from ” \ ” to ” ‘ “, which is useful in Windows where “\” is used in the file paths.
Instructions (eg FROM) aren’t case sensitive but convention is to use capitals. All Dockerfiles need to start with the FROM instruction, which indicates the base image, because each instruction is layered on top of this. They can have a ARG value declared before the FROM, but this ARG will only then apply to the FROM.
FROM can appear multiple times in the file, and this will mean multiple images are produced. By default, Docker assumes the ‘latest’ tag for a base image, but you can specify with “<base_image>:<t_tag>”. It is also possible to include –platform to indicate base platform.
It is often necessary to specify Environment Variables, for use in subsequent instructions (standard to use CAPITALS for Env Variables). Here we use the ENV instruction, and quote characters will be removed, if not escaped. ENV variables can be overridden at run time, with docker run –env <key>=<value>. If an Environment Variable is only needed for the RUN instruction, then it can be specified in RUN itself or as an ARG. ARG defines a variable too, though it should NOT be used for secrets as the ARG variable is visible in the History command against the container image. ENV always overrides ARG variables of the same name (even if the ARG is passed in from the Command Line, the ENV variable will be preferred).
RUN delivers a set of commands at the shell or Exec form. For a number of instructions (RUN, CMD, ENTRYPOINT, and VOLUME) it is possible to run in Exec mode or Shell mode. In Exec mode you need to specify in square brakets [ ], and “double quotes” in which case it’s parsed as a JSON array and run in Exec mode (you also need to excape file paths in Windows in Exec mode). For example, here specifying in Exec mode to run in shell mode, and then a windows file path:
In RUN, we use \ to continue commands on the next line.
LABEL adds metadata to an image, in the form of key:value pairs. LABELs are also inherited from base image. You can inspect just the labels with –format parameter: $ docker image inspect –format myimage
The EXPOSE instruction is mainly for documentation, as the ports are normally specified using -p during the ‘docker run’ command. However, it is possible to utilise the EXPOSE instruction, using -P at run time. For more info on the options available, there is a Good summary of EXPOSE options here. You can expose TCP and UDP separately:
The COPY instruction allows you to copy files and directories from a source location, to a destination location within the file system of the image. You can use wildcards in the source location to pick up a range of files/ directories. There are two ways to specify the source: either as a relative path or as an absolute path. If the destination directory doesn’t exist it will get created.
All files will have UID and GID of 0, unless you specify differently with the –chown flag.
Docker also have an ADD instruction which is similar but as per the best practice guide referenced above, the COPY instruction is preferred. ADD will get packages from remote URLs, but it is advised to use RUN curl or wget to do this instead. Also, ADD is also auto-extract from TAR files.
The USER instruction sets the user name, and in Windows this user must be created first. The default group is “root”, unless specified as an option.
There are a couple of instructions for setting working directories. First, there is WORKDIR, which sets the working directory for following RUN, CMD, ENTRYPOINT, COPY and ADD instuctions. If it doesn’t exist, it will be created. It can be used multiple times, but will compound using the previous one to form a compound relative path (/a/b/c).
ENV PATH = /path
Secondly, there is VOLUME, which creates a mount point with the specified name and marks it as holding externally mountable volumes from native hosts. Like RUN CMD and ENTRYPOINT, the instruction can be presented in JSON array [“var/log”] or ARGS /var/log /var/db. In Windows must be a non-existing or empty directory or on a drive other than C. You can’t mount a host directory.
Finally, there are two instructions which tell the container where to start, ENTRYPOINT and CMD. Every, Dockerfile must have CMD or ENTRYPOINT! CMD is used for presenting default arguments (as can be overridden by Command Line arguments, and for non-optional settings (stable CMDs) ENTRYPOINT should be used. You can have multiple ENTRYPOINT statements, but only the last instruction will have effect (same with CMD, a common presidence in Dockerfiles).
There are two forms of ENTRYPOINT. First, is the JSON form/ Exec, where you can use either form of CMD to set additional, optional parameters. Shell form of ENTRYPOINT prevents any CMD or docker run commands being used.
ENTRYPOINT [“executable”, “param1”, “param2”] – Exec
ENTRYPOINT command param1 param2 – shell
CMD actually has 3 forms: [JSON Array format], [parameters for ENTRYPOINT], and shell form. If you want image to run a command every time, you should use ENTRYPOINT as ‘docker run’ parameters will override defaults specified in CMD. If you want to use CMD to provide parameters for ENTRYPOINT, then both should be in JSON Array format. CMD does not execute anything at Build time (like RUN), but specifies an intended command for image. Yuri has a good summary of the difference between Shell form and [Exec form] for RUN, CMD and ENTRYPOINT here.
This wraps up the main components of a Dockerfile. I hope you found it useful. If you’d like to look at examples, one of the best ways I’ve found is searching on “Dockerfile” on Github. There’s many to chose from.