Mastering Docker Multi-Stage Builds for Efficient Containerization

Docker multi-stage builds are a powerful feature that allows you to create smaller, more secure container images. By separating the build environment from the runtime environment, you can ensure your production containers only contain what's necessary to run your application. In this guide, we'll explore multi-stage builds with practical examples for different programming languages.

Optimizing Docker Images with Multi-Stage Builds: Reduce Size & Boost Security

Introduction to Multi-Stage Builds

Multi-stage builds were introduced in Docker 17.05 to solve a common problem: how to create small, efficient container images without sacrificing build tools and dependencies. This guide is based on our Lab4 Multi-Stage Build Example from the Docker Practical Guide repository.

Before multi-stage builds, developers had to choose between:

Single Dockerfile: Creating large images containing build tools and dependencies
Builder Pattern: Using multiple Dockerfiles with complex shell scripts to coordinate them

Multi-stage builds elegantly solve this problem by allowing multiple FROM statements in a single Dockerfile. Each FROM statement begins a new stage, and you can selectively copy artifacts from one stage to another, leaving behind everything you don't need in the final image.

The Problem with Single-Stage Builds

Let's first understand why single-stage builds can be problematic:

# Single-stage example for a Go application
FROM golang:1.17

WORKDIR /app

COPY . .

RUN go mod download
RUN go build -o /app/server .

EXPOSE 8080

CMD ["/app/server"]

This approach works but has significant drawbacks:

Large Images: The final image includes the entire Go toolchain and build dependencies
Security Risks: More tools and libraries mean a larger attack surface
Inefficient Caching: Changes to source code invalidate the cache for all subsequent layers
Higher Transfer Costs: Larger images take longer to push/pull from registries

Let's see how multi-stage builds solve these issues.

How Multi-Stage Builds Work

A multi-stage build Dockerfile contains multiple FROM instructions, with each creating a new build stage:

┌────────────────────────────────────────────────────────────┐
│                 Multi-Stage Build Process                  │
│                                                            │
│  ┌─────────────────┐       ┌─────────────────┐             │
│  │                 │       │                 │             │
│  │   Build Stage   │       │  Runtime Stage  │             │
│  │   (with all     │──────►│  (minimal       │             │
│  │    build tools) │ COPY  │   runtime)      │             │
│  │                 │       │                 │             │
│  └─────────────────┘       └─────────────────┘             │
│                                                            │
└────────────────────────────────────────────────────────────┘

The key features of multi-stage builds:

Multiple FROM Instructions: Each starts a new build stage
Named Stages: You can name stages for clarity using AS <name>
Selective Copying: Use COPY --from=<stage> to copy only what you need
Discarded Stages: Anything not explicitly copied is discarded
Multiple Final Images: You can build different final images from the same Dockerfile

Now let's look at practical examples for different programming languages.

Example 1: Go Application

Go applications are perfect candidates for multi-stage builds because they compile to a single binary:

# Build stage
FROM golang:1.17 AS build

WORKDIR /app

COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /app/server .

# Run stage
FROM alpine:3.15

RUN apk --no-cache add ca-certificates

WORKDIR /app

COPY --from=build /app/server /app/

EXPOSE 8080

CMD ["/app/server"]

This approach has several advantages:

Minimal Final Image: The runtime stage only contains the compiled binary and necessary certificates
No Build Tools: The Go toolchain is only present in the build stage
Smaller Attack Surface: Fewer packages mean fewer potential vulnerabilities
Static Binary: Using CGO_ENABLED=0 creates a static binary that doesn't depend on libc

For even smaller images, you can use the scratch base image:

# Run stage
FROM scratch

COPY --from=build /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=build /app/server /server

EXPOSE 8080

CMD ["/server"]

The scratch image is completely empty, resulting in the smallest possible container size, often under 10MB for a Go application.

Example 2: Node.js Application

For Node.js applications, we can separate the build environment from the runtime:

# Build stage
FROM node:16 AS build

WORKDIR /app

COPY package*.json ./
RUN npm ci

COPY . .
RUN npm run build

# Run stage
FROM node:16-alpine

WORKDIR /app

# Copy only production dependencies
COPY --from=build /app/package*.json ./
RUN npm ci --only=production

# Copy built application from build stage
COPY --from=build /app/dist ./dist

EXPOSE 3000

CMD ["node", "dist/server.js"]

Key benefits for Node.js applications:

No Development Dependencies: The final image contains only production dependencies
Smaller Node.js Base Image: Using Alpine Linux reduces the base image size
Clean Build Environment: The build stage provides a consistent environment for transpiling or bundling

For front-end applications that generate static files, you can use an even smaller runtime:

# Build stage
FROM node:16 AS build

WORKDIR /app

COPY package*.json ./
RUN npm ci

COPY . .
RUN npm run build

# Run stage
FROM nginx:alpine

COPY --from=build /app/build /usr/share/nginx/html

EXPOSE 80

CMD ["nginx", "-g", "daemon off;"]

This approach is ideal for React, Vue.js, or Angular applications that build to static assets.

Example 3: Python Application

Python applications can also benefit from multi-stage builds, especially when using tools like Poetry:

# Build stage
FROM python:3.10-slim AS build

WORKDIR /app

RUN pip install poetry

COPY pyproject.toml poetry.lock* ./
RUN poetry export -f requirements.txt > requirements.txt

# Run stage
FROM python:3.10-slim

WORKDIR /app

COPY --from=build /app/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8000

CMD ["python", "-m", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

In this example:

The build stage uses Poetry to generate a requirements.txt file
The runtime stage installs only the required packages
The final image doesn't contain Poetry or any development dependencies

For Python applications with compiled C extensions, this approach can significantly reduce image size by leaving out compilers and build headers.

Example 4: Java Spring Boot Application

Java applications typically have a build stage that includes the JDK and a runtime stage with just the JRE:

# Build stage
FROM maven:3.8-openjdk-17 AS build

WORKDIR /app

COPY pom.xml .
RUN mvn dependency:go-offline

COPY src ./src
RUN mvn package -DskipTests

# Run stage
FROM eclipse-temurin:17-jre-alpine

WORKDIR /app

COPY --from=build /app/target/*.jar app.jar

EXPOSE 8080

CMD ["java", "-jar", "app.jar"]

The benefits for Java applications:

No Build Tools: The final image doesn't include Maven or the JDK
Smaller Base Image: JRE-only images are significantly smaller than JDK images
Efficient Layer Caching: Dependencies are downloaded separately from compilation
Alpine Base: Further reduces the image size

For even smaller Java images, you can create a custom JRE with jlink:

# Build stage
FROM maven:3.8-openjdk-17 AS build

WORKDIR /app

COPY pom.xml .
RUN mvn dependency:go-offline

COPY src ./src
RUN mvn package -DskipTests

# JRE creation stage
FROM eclipse-temurin:17 AS jre-build

RUN jlink \
    --add-modules java.base,java.logging,java.sql,java.desktop,java.management,java.naming,java.security.jgss,java.instrument \
    --strip-debug \
    --no-man-pages \
    --no-header-files \
    --compress=2 \
    --output /javaruntime

# Run stage
FROM alpine:3.15

RUN apk --no-cache add ca-certificates

WORKDIR /app

COPY --from=jre-build /javaruntime /opt/java
COPY --from=build /app/target/*.jar app.jar

ENV PATH="${PATH}:/opt/java/bin"

EXPOSE 8080

CMD ["java", "-jar", "app.jar"]

This approach creates a minimal custom JRE with only the modules your application needs, resulting in a much smaller image.

Best Practices for Multi-Stage Builds

To get the most out of multi-stage builds, follow these best practices:

1. Order Layers by Frequency of Change

Place infrequently changed operations first to maximize caching:

FROM node:16 AS build

WORKDIR /app

# Rarely changes
COPY package*.json ./
RUN npm ci

# Changes more frequently
COPY . .
RUN npm run build

2. Use Explicit Image Tags

Avoid latest tags to ensure build reproducibility:

# Good
FROM node:16.14.0-alpine3.15 AS build

# Avoid
FROM node:latest AS build

3. Name Your Stages

Named stages improve readability and maintenance:

FROM golang:1.17 AS builder
# ...

FROM alpine:3.15 AS final
# ...

4. Use Small Base Images

For runtime stages, prioritize smaller base images:

Alpine Linux: alpine:3.15
Distroless: gcr.io/distroless/static
Scratch: scratch

5. Use Build Arguments for Flexibility

ARG NODE_ENV=production
FROM node:16-alpine AS build

WORKDIR /app

COPY package*.json ./
RUN npm ci

COPY . .
RUN npm run build:${NODE_ENV}

6. Multi-architecture Builds

Support multiple CPU architectures with build arguments:

ARG ARCH=amd64
FROM --platform=linux/${ARCH} golang:1.17 AS build
# ...

Measuring Image Size Improvements

Let's compare image sizes for our Go application example:

┌───────────────────────────────────────────────────────────┐
│                    Docker Image Sizes                     │
│                                                           │
│  ┌───────────────────────────────────┐ │                  │
│  │ Single-Stage Go                   │ │                  │
│  │ (golang:1.17)                     │ │                  │
│  │ (golang:1.17)                     │ │ 1.07 GB          │
│  └───────────────────────────────────┘ │                  │
│                                        │                  │
│  ┌────────────────────┐ │                                 │
│  │ Multi-Stage Go     │ │                                 │
│  │ (alpine)           │ │ 15.6 MB                         │
│  └────────────────────┘ │                                 │
│                                                           │
│  ┌───────────┐ │                                          │
│  │ Go        │ │                                          │
│  │ (scratch) │ │ 7.2 MB                                   │
│  └───────────┘ │                                          │
│                                                           │
└───────────────────────────────────────────────────────────┘

The results are dramatic:

Single-stage Go image: ~1.07 GB
Multi-stage Go image with Alpine: ~15.6 MB
Multi-stage Go image with scratch: ~7.2 MB

That's a 99% reduction in image size!

Similar improvements can be seen with other languages:

Node.js: 50-70% reduction
Python: 40-60% reduction
Java: 60-80% reduction

Security Benefits of Multi-Stage Builds

Beyond size optimization, multi-stage builds significantly improve security:

Reduced Attack Surface: Fewer packages mean fewer potential vulnerabilities
No Build Tools in Production: Compilers, build tools, and development dependencies can be exploited if present
Minimal Runtime: Only the exact runtime dependencies needed to execute your application
Separation of Concerns: Build secrets (like API keys for private package repositories) don't leak into the final image
Regular Base Image Updates: Smaller images are easier to rebuild and update regularly

For even better security, combine multi-stage builds with non-root users:

FROM node:16-alpine

# Create app directory and non-root user
RUN mkdir -p /app && \
    addgroup -g 1001 appgroup && \
    adduser -u 1001 -G appgroup -h /app -D appuser

WORKDIR /app

COPY --chown=appuser:appgroup --from=build /app/dist ./dist
COPY --chown=appuser:appgroup --from=build /app/node_modules ./node_modules

USER appuser

EXPOSE 3000

CMD ["node", "dist/server.js"]

Cleanup

After working with multi-stage builds and experimenting with different approaches, it's important to clean up your Docker environment to free up disk space and maintain a well-organized system. Multi-stage builds can create multiple intermediary images that consume storage space.

Removing Unused Images

The most important cleanup task after experimenting with multi-stage builds is to remove unused images, especially those large builder images:

# List all images to see what's consuming space
docker images

# Remove specific images
docker rmi go-app:latest node-app:latest python-app:latest java-app:latest

# Remove intermediary images (those with <none> tags)
docker rmi $(docker images -f "dangling=true" -q)

Removing Containers

If you've been testing your images, you may also have stopped containers consuming resources:

# Remove all stopped containers
docker container prune

# Or remove specific containers
docker rm go-app-container node-app-container

Cleaning Up the Build Cache

Multi-stage builds can accumulate build cache that takes up disk space:

# Clear build cache (Docker 17.06.1 or later)
docker builder prune

# Remove all unused build cache
docker builder prune --all

# Force removal without prompt
docker builder prune --force

Comprehensive Cleanup

For a complete cleanup after your multi-stage build experiments:

# Remove all unused containers, networks, images (both dangling and unreferenced), and build cache
docker system prune -a

# Include volumes in the cleanup
docker system prune -a --volumes

Tracking Image Size Improvements

If you want to keep track of the size improvements you've achieved with multi-stage builds:

# Create a report of image sizes before cleanup
docker images --format ": - " > image-sizes.txt

Cleanup for Lab Examples

If you've been following the examples from our lab:

# Remove the example images
docker rmi go-example node-example python-example java-example

# Clean up resources from docker-compose examples
cd /path/to/lab4_multi_stage_build_example
docker-compose down --rmi all

Regular cleanup after experimenting with multi-stage builds ensures your system remains efficient and prevents wasted disk space on unused builder images.

Conclusion

Multi-stage builds are an essential technique for creating efficient, secure Docker images. By separating the build environment from the runtime environment, you can dramatically reduce image size, improve security, and streamline your CI/CD pipelines.

In this guide, we've explored:

The fundamentals of multi-stage builds
Practical examples for Go, Node.js, Python, and Java applications
Best practices for optimizing your builds
Quantifiable benefits in terms of image size
Security improvements from using multi-stage builds

For any containerized application, multi-stage builds should be considered the default approach. The benefits in terms of size, security, and efficiency make them a critical part of any Docker workflow.

In the next article in our Docker Practical Guide series, we'll explore image management best practices, including tagging strategies, registry interactions, and image optimization techniques. Stay tuned!

Are you using multi-stage builds in your Docker workflow? Share your experiences or ask questions in the comments below!

Optimizing Docker Images with Multi-Stage Builds: Reduce Size & Boost Security | Docker Practical Series #7