As of a couple weeks ago, Docker's BuildKit tool for building Dockerfiles now
supports heredoc syntax!
With these new improvements, we can do all sorts of things that were difficult
before, like multiline
RUNs without needing all those pesky backslashes at
the end of each line, or the creation of small inline configuration files.
In this post, I'll cover the basics of what these heredocs are, and more importantly what you can use them for, and how to get started with them! 🎉
BuildKit (a quick refresher)
From BuildKit's own github:
BuildKit is a toolkit for converting source code to build artifacts in an efficient, expressive and repeatable manner.
Essentially, it's the next generation builder for docker images, neatly separate from the rest of the main docker runtime; you can use it for building docker images, or images for other OCI runtimes.
It comes with a lot of useful (and pretty) features beyond what the basic builder supports, including neater build log output, faster and more cache-efficient builds, concurrent builds, as well as a very flexible architecture to allow easy extendability (I'm definitely not doing it justice).
You're either most likely using it already, or you probably want to be!
You can enable it locally by setting the environment variable
DOCKER_BUILDKIT=1 when performing your
docker build, or switch to using the
docker buildx command.
At a slightly more technical level, buildkit allows easy switching between multiple different "builders", which can be local or remote, in the docker daemon itself, in docker containers or even in a Kubernetes pod. The builder itself is split up into two main pieces, a frontend and a backend: the frontend produces intermediate Low Level Builder (LLB) code, which is then constructed into an image by the backend.
Part of what makes buildkit so fantastic is it's flexibility - these components are completely detached from each other, so you can use any frontend in any image. For example, you could use the default Dockerfile frontend, or compile your own self-contained buildpacks, or even develop your own alternative file format like Mockerfile.
To get started with using heredocs, first make sure you're setup with buildkit. Switching to buildkit gives you a ton of out-of-the-box improvements to your build setup, and should have complete compatibility with the old builder (and you can always switch back if you don't like it).
With buildkit properly setup, you can create a new
Dockerfile: at the top of
this file, we need to include a
#syntax= directive. This directive informs
the parser to use a specific frontend - in this case, the one located at
on Docker Hub.
With this line (which has to be the very first line), buildkit will find and download the right image, and then use it to build the image.
We then specify the base image to build from (just like we normally would):
With all that out the way, we can use a heredoc, executing two commands in the
RUN <<EOF echo "Hello" >> /hello echo "World!" >> /hello EOF
Now that heredocs are working, you might be wondering - why all the fuss? Well, this feature has kind of, until now, been missing from Dockerfiles.
Let's suppose you want to build an image that requires a lot of commands to
setup. For example, a fairly common pattern in
Dockerfiles involves wanting to
update the system, and then to install some additional dependencies, i.e.
install all at once.
Naively, we might put all of these as separate
RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y ...
But, sadly like too many intuitive solutions, this doesn't quite do what we
want. It certainly works - but we create a new layer for each
RUN, making our
image much larger than it needs to be (and making builds take much longer).
So, we can squish this into a single
RUN apt-get update && \ apt-get upgrade -y && \ apt-get install -y ...
And that's what most
Dockerfiles do today, from the official docker images
down to the messy ones I've written for myself. It works fine, images are small
and fast to build... but it does look a bit ugly. And if you accidentally
forget the line continuation symbol
\, well, you'll get a syntax error!
Heredocs are the next step to improve this! Now, we can just write:
RUN <<EOF apt-get update apt-get upgrade -y apt-get install -y ... EOF
We use the
<<EOF to introduce the heredoc (just like in sh/bash/zsh/your
shell of choice), and
EOF at the end to close it. In between those, we put
all our commands as the content of our script to be run by the shell!
More ways to run...
So far, we've seen some basic syntax. However, the new heredoc support doesn't just allow simple examples, there's lots of other fun things you can do.
For completeness, the hello world example using the same syntax we've already seen:
RUN <<EOF echo "Hello" >> /hello echo "World!" >> /hello EOF
But let's say your setup scripts are getting more complicated, and you want to use another language - say, like Python. Well, no problem, you can connect heredocs to other programs!
RUN python3 <<EOF with open("/hello", "w") as f: print("Hello", file=f) print("World", file=f) EOF
In fact, you can use as complex commands as you like with heredocs, simplifying the above to:
RUN python3 <<EOF > /hello print("Hello") print("World") EOF
If that feels like it's getting a bit fiddly or complicated, you can also always just use a shebang:
RUN <<EOF #!/usr/bin/env python3 with open("/hello", "w") as f: print("Hello", file=f) print("World", file=f) EOF
There's lots of different ways to connect heredocs to
RUN, and hopefully some
more ways and improvements to come in the future!
...and some file fun!
Dockerfiles also let us mess around with inline files! Let's
suppose you're building an nginx site, and want to create a custom
FROM nginx COPY index.html /usr/share/nginx/html
And then in a separate file
index.html, you put your content. But if your
index page is just really simple, it feels frustrating to have to separate
everything out: heredocs let you keep everything in the same place if you want!
FROM nginx COPY <<EOF /usr/share/nginx/html/index.html (your index page goes here) EOF
You can even copy multiple files at once, in a single layer:
COPY <<robots.txt <<humans.txt /usr/share/nginx/html/ (robots content) robots.txt (humans content) humans.txt
Hopefully, I've managed to convince you to give heredocs a try when you can! For now, they're still only available in the staging frontend, but they should be making their way into a release very soon - so make sure to take a look and give your feedback!
If you're interested, you can find out more from the official buildkit Dockerfile syntax guide.