Making Container Images Start Faster with Lazy Pulling

For many containers, image distribution is still the biggest drag on startup time. In practice, pulling an image often accounts for more than 70% of the total startup latency. The problem becomes much more obvious with large VM-style or AI images, which can easily grow to tens of gigabytes. At that size, both transfer and decompression turn into startup bottlenecks.

What makes this tricky is that the usual fixes only go so far. Caching helps in some cases, shrinking images helps in others, but neither fully addresses the cold-start path for large images. That is where lazy pulling becomes interesting: instead of downloading everything before startup, fetch only what is actually needed.

Why image pulls are slow

The biggest reasons are fairly straightforward.

Images are too large. VM, AI, and similar images often occupy tens of GB, so download time alone can be substantial.
gzip decompression is expensive. In internal networks especially, transfer may no longer be the main cost. Decompression can take longer than the network pull itself and become the new bottleneck.

The usual ways to optimize image startup

Image caching

Caching is the most familiar way to reduce pull time. If the image is already local, repeated downloads can be avoided.

But caching does not solve cold starts. It also becomes less effective when images change frequently because of application releases or security updates. Once layers are invalidated, the benefit disappears, and efficient cache refresh logic is not simple to manage.

Reducing image size

Smaller images generally start faster because there is less data to pull.

That said, there are scenarios where image size is hard to shrink in any meaningful way. VM images, AI environments, and CUDA-based images often require more than 7 GB of storage, leaving limited room for further optimization.

Can images be loaded on demand?

In many cases, a container does not need the full image contents at startup. Some published work has shown that only 6.4% of an image may be needed during the startup phase. In theory, that makes on-demand downloading a very attractive way to speed things up.

The obstacle is the image format and runtime stack commonly used today.

OverlayFS limitations. The final filesystem structure is not known until all image layers have been downloaded.
gzip does not support random access. Even if only one file is needed, the whole layer still has to be downloaded and decompressed.
Verification happens at layer level. Image digests are calculated for entire layers, so there is no straightforward way to verify individual files.

These constraints make true file-level lazy pulling difficult with ordinary image layers.

How eStargz makes lazy pulling practical

To work around these issues, eStargz modifies how gzip-based layers are organized. Its core idea is simple: compress each file independently and add file-level indexing.

That brings two important improvements:

Independent compression for each file. This avoids the usual gzip random-access problem.
Per-file verification. Files can be validated individually instead of verifying only the whole layer.

The storage layout looks like this:

eStargz storage layout

Each file is compressed separately and then merged into one large blob. At the end of that blob, a TOC descriptor is added. The TOC records each file's offset and checksum, which makes file-level indexing and validation possible.

A sample TOC looks like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

{ 
  "version": 1,
  "entries": [
    {
      "name": "bin/",
      "type": "dir",
      "modtime": "2019-08-20T10:30:43Z",
      "mode": 16877,
      "NumLink": 0
    },
    {
      "name": "bin/busybox",
      "type": "reg",
      "size": 833104,
      "modtime": "2019-06-12T17:52:45Z",
      "mode": 33261,
      "offset": 126,
      "NumLink": 0,
      "digest": "sha256:8b7c559b8cccca0d30d01bc4b5dc944766208a53d18a03aa8afe97252207521f",
      "chunkDigest": "sha256:8b7c559b8cccca0d30d01bc4b5dc944766208a53d18a03aa8afe97252207521f"
    }
  ]
}

With this structure, eStargz can fetch files lazily while still supporting file-level integrity checks.

The trade-off: faster startup, possible runtime cost

Lazy pulling can improve startup speed dramatically, but it is not free. If files are fetched too late, runtime performance can suffer.

To reduce that penalty, eStargz uses a priority-based loading strategy. Files needed during startup can be placed in a prioritized zone, so they are downloaded first. That helps preserve runtime responsiveness while keeping the startup path short.

priority loading optimization

The performance results reported by the author are shown below:

performance comparison

Costs and operational challenges

eStargz improves lazy loading, but it also introduces some overhead.

More storage overhead. Compressing files independently adds metadata and reduces compression efficiency.
Extra plugin support. The format requires supporting components, especially when pushing and pulling images.

So while the startup gains are real, adoption usually depends on whether the environment can absorb the added format and runtime complexity.

How to use eStargz

eStargz can be used with containerd-related tooling and other tools that support the format.

Command-line examples for Docker, kaniko, and nerdctl

docker buildx build -t ghcr.io/ktock/hello:esgz \
  -o type=registry,oci-mediatypes=true,compression=estargz,force-compression=true \
  /tmp/buildctx/

nerdctl image convert --estargz --oci ghcr.io/ktock/hello:1 ghcr.io/ktock/hello:esgz

containerd plugin configuration

version = 2

[proxy_plugins]
  [proxy_plugins.stargz]
    type = "snapshot"
    address = "/run/containerd-stargz-grpc/containerd-stargz-grpc.sock"

[plugins."io.containerd.grpc.v1.cri".containerd]
  snapshotter = "stargz"
  disable_snapshot_annotations = false

Similar approaches have already been enabled by default on some cloud platforms, including GKE, to accelerate image startup. There are also on-demand loading implementations based on block devices from cloud vendors, which suggests this kind of technique has already seen large-scale real-world adoption.

From basic caching and image-size reduction to true lazy pulling, the direction is clear: if containers only need part of an image to start, distributing the whole thing up front is often unnecessary. eStargz offers a practical way to bridge that gap by combining partial fetching with file-level indexing and verification.