August 13th, 2024

Faster Docker builds using a remote BuildKit instance

Setting up a remote BuildKit instance on AWS can reduce Docker build times significantly. Key strategies include powerful hardware, layer caching, and Dockerfile optimization, benefiting small to medium-sized teams.

Read original articleLink Icon
CuriosityOptimismFrustration
Faster Docker builds using a remote BuildKit instance

Docker has transformed application development, but lengthy build times can hinder productivity. A recent blog post outlines how to set up a remote BuildKit instance on AWS to significantly reduce Docker build times. The author highlights three key strategies for optimizing builds: using powerful hardware, leveraging layer caching, and optimizing Dockerfiles. BuildKit, a modern backend for Docker, allows builds to be executed on a remote server, freeing local resources. By provisioning a compute-optimized EC2 instance and utilizing an EBS volume for persistent caching, teams can enhance build performance. The blog provides a step-by-step guide for setting up the remote BuildKit instance, including configuring GitHub Actions to connect to the BuildKit server. Initial tests showed a reduction in build time from over six minutes to just under two minutes when caching was utilized. However, the approach has limitations, including a lack of autoscaling, potential cost inefficiencies, and security risks associated with shared environments. Despite these drawbacks, the solution is effective for small to medium-sized teams looking to improve build performance.

- Setting up a remote BuildKit instance on AWS can significantly reduce Docker build times.

- Key optimization strategies include using powerful hardware, layer caching, and Dockerfile optimization.

- Initial tests showed a reduction in build time from 6:22 minutes to 1:34 minutes with caching.

- Limitations include lack of autoscaling, cost inefficiencies, and security risks in shared environments.

- The solution is particularly beneficial for small to medium-sized teams.

AI: What people are saying
The comments reflect a variety of experiences and opinions regarding remote Docker builds and optimization strategies.
  • Several users discuss their own implementations of build optimization, including custom forks of BuildKit and alternative tools like Earthly and WarpBuild.
  • There is a consensus on the importance of caching and efficient resource management to reduce build times.
  • Some commenters express concerns about the costs associated with powerful cloud instances for builds.
  • Users share insights on the challenges of CI/CD processes, particularly regarding test execution times.
  • There is a desire for consistency in build service interfaces to facilitate switching between different cloud providers.
Link Icon 12 comments
By @TechSquidTV - 9 months
This is fairly similar in concept to what we do over at depot.dev https://depot.dev/blog/depot-magic-explained

We've found that BuildKit has several inefficiencies preventing it from being as fast as it could be in the cloud, especially when dealing with simultaneous builds (common in CI). That led us to create our own optimized fork of BuildKit.

The number of fine-tuning knobs you can turn running a self-hosted BuildKit instance is limitless, but I also encourage everyone to try it as a fantastic learning exercise.

By @rtpg - 9 months
I really am hopeful we come a bit full circle on builders and machines to "we buy one or two very expensive machines that run CI and builds". Caching in particular is just sitting there, waiting to be properly captured, instead of constantly churning on various machines.

Of course, CI SaaSes implement a lot of caching on their end, but they also try to put people on the most anemic machines possible to try and capture those juicy margins.

By @zerotolerance - 9 months
There are a few tragedies in the Docker story, but at least two are specifically tied to naming things. First, Swarm (mode) because by the time they released Swarm (mode) the world had already taken a collective dump on Swarm (the proof-of-concept). Even in 2024 most of the time people talk about Swarm and start dumping on it they're actually talking about the proof-of-concept architecture. Second, they should never have called the subcommand "build." It isn't building anything. In this case "build" is performing a packaging step with very raw tools. But the minute they called it build people started literally building software INSIDE intermediate container layers on the way to assembling a packaged container. Dockerfile is about as weak of a build tool as you could possibly ask for. Zero useful features with respect to building software. But Docker named it "build" and now we've got Dockerfile calling compilation steps, test commands, and dependency retrieval steps.
By @pxc - 9 months
I'm in the process of rolling out something analogous at work, where Nix jobs run inside rootless Podman containers but the Nix store and Nix daemon socket are passed through from the host, so the jobs' dependencies all persist, dependencies shared between projects are stored only once, when two concurrent jobs ask for the same dependency they both just wait for the dependency to be fetched once, etc.

We also currently have some jobs that build OCI images via the Docker/Podman CLI amd build using traditional Dockerfile/Containerfile scripts. For now those are centralized and run on just one host, on bare metal. I'd like to get those working via rootless Docker-in-Docker/Podman-in-Podman, but one thing that will be a little annoying with that is that we won't have any persistent caching at the Docker/Podman layer anymore. I suppose we'll end up using something like what's in the article to get that cache persistence back.

By @aliasxneo - 9 months
This is basically what we do, except we use Earthly[1]. An Earthly satellite is basically a modified remote Docker Buildkit instance.

[1]: https://earthly.dev/

By @AkihiroSuda - 9 months
> endpoint: tcp://${{ secrets.BUILDKIT_HOST }}:9999

This should be protected with mTLS (https://docs.docker.com/build/drivers/remote/) or SSH (`endpoint: ssh://user@host`) to avoid potential cryptomining attack, etc.

By @bhouston - 9 months
30 minute docker builds? Crazy.

I know it is out of style for some, but my microservice architecture, which has a dozen services, each takes about 1:30m to build, maybe 2m at most (if there is a slow Next.js build in there and a few thousand npm packages), and that is just on a 4 core GitHub Actions worker.

My microservices all build and deploy in parallel so this system doesn't get slower as you expand to more services.

(Open source template which shows how it works: https://github.com/bhouston/template-typescript-monorepo/act... )

By @spankalee - 9 months
Google's Cloud Build has always worked very well for me for remote builds, but it'd be nice if BuildKit works as consistent service interface so it's easy to switch between build backend providers.
By @suryao - 9 months
This is pretty cool - provides a good speed up for container builds. The couple of beefy instances can set you back $200-1000 a month on aws apart from the regular github action runner costs and it only goes up from there. We have a way around that plus effective scaling for multiple parallel builds with WarpBuild.

As a side note: In my time running a CI infra co, we see that a majority of the workflow time for large teams comes from tests - which can have over 200 shards in some cases.

By @crohr - 9 months
Another option is to simply cache layers in a fast cache close by (e.g. S3) ? Like https://runs-on.com/features/s3-cache-for-github-actions/#us...?
By @xyst - 9 months
> At Blacksmith, we regularly see our customer’s Docker builds taking 30 minutes or more

What’s the most common cause of builds taking this long in the first place…

Worst I have ever had was 5 minutes, but subsequent builds were reduced to under a minute due to build cache, creating multi-stage builds, and keeping the layers thin and optimizing the .dockerignore

By @delduca - 9 months
I prefer to use a VPS