I have done the following
Steps to reproduce
Reproduction
Dockerfile (Dockerfile.shared):
FROM alpine:3.20 AS builder
RUN mkdir -p /install && \
head -c 1048576 /dev/urandom > /install/payload && \
echo "chaos-1506 builder stage complete" > /install/marker
FROM alpine:3.20 AS runtime
COPY --from=builder /install /usr/local
CMD ["cat", "/usr/local/marker"]
Script:
#!/usr/bin/env bash
set -u
for i in 1 2 3 4 5; do
container image rm chaos-1506-a:latest chaos-1506-b:latest >/dev/null 2>&1 || true
container build --no-cache -t chaos-1506-a:latest -f Dockerfile.shared . \
> /tmp/chaos-1506-a-${i}.log 2>&1 &
PID_A=$!
container build --no-cache -t chaos-1506-b:latest -f Dockerfile.shared . \
> /tmp/chaos-1506-b-${i}.log 2>&1 &
PID_B=$!
wait $PID_A; wait $PID_B
grep "linux/arm64" /tmp/chaos-1506-a-${i}.log
grep "linux/arm64" /tmp/chaos-1506-b-${i}.log
done
The full script and fixture live at
Sample Compose Files/CHAOS-1506-repro/ in
https://github.com/full-chaos/container-compose
(Dockerfile.shared, repro.sh). The reproduction does not depend on
container-compose — that's just where the repro happens to live for
convenience.
Observed signal
Every per-build log shows the platform identifier changing between
stages of the same build. Example from one of build A's logs:
#6 [linux/arm64 builder 1/2] RUN mkdir -p /install ...
#7 [linux/arm64/v8 runtime 1/2] COPY --from=builder /install /usr/local
Builder stage: linux/arm64. Runtime stage of the same build:
linux/arm64/v8. The script never passes --arch or --platform —
container build defaulted both ends to the host arch but normalized
them inconsistently.
Reproduced 5/5 fresh-build iterations, every per-build log
(/tmp/chaos-1506-a-{1..5}.log and /tmp/chaos-1506-b-{1..5}.log —
10 total): each contains exactly one linux/arm64/v8 occurrence
alongside linux/arm64 ones.
Production failure (Linear CHAOS-1506)
The same drift manifested as a hard build failure in a heavier user
workload (Python pip-install multi-stage Dockerfile):
=> CACHED [linux/arm64 builder 4/5] RUN pip install --prefix=/install ... 0.0s
=> [linux/arm64/v8 builder 3/5] COPY . . 19.6s
=> ERROR [linux/arm64/v8 runtime 2/5] COPY --from=builder /install /usr/local 0.0s
=> [linux/arm64 builder 4/5] RUN pip install --prefix=/install ... 2.1s
Same arm64/arm64/v8 drift, plus the COPY --from=builder resolved
to a state that triggered ERROR before the builder stage of the same
build completed. The lightweight repro above shows the drift but does
not reliably fire the COPY failure — likely because the alpine builder
finishes too quickly to widen the race window against runtime's
COPY --from=builder lookup.
Hypothesis
The runtime stage's COPY --from=builder resolves the stage by name
via a platform-keyed lookup. When the lookup uses arm64/v8 but the
layer was registered under arm64 (or vice-versa), one of two things
happens:
- The stage cannot be found at all → COPY error fires before the
builder stage completes.
- The lookup hits a sibling build's in-progress content-addressed cache
entry (same content hash, different platform key) and fails on
partial extraction under contention.
We were unable to confirm the precise mechanism from the client side —
the platform normalization happens inside the buildkit-shim.
Client-side workaround
Users hit by the COPY failure can serialize build dispatches at the
client by running them sequentially (or via a tool that bounds
parallelism to 1). This avoids two concurrent BuildCommand clients
dialing the same buildkit container, which sidesteps the contention
that turns the platform drift into a hard COPY failure. It does not
fix the underlying normalization inconsistency.
Ask
- Confirm platform-string normalization is intended to be consistent
across stages of one build invocation.
- Investigate whether
COPY --from=<stage> resolution is platform-keyed
in a way that's vulnerable to the drift.
- Once fixed, the client workaround can be removed.
Logs
container-build-shell-repro-build-B-run1.log
container-build-shell-repro-build-A-run1.log
Problem description
container build emits inconsistent platform identifiers across stages
of one build — linux/arm64 for some stages and linux/arm64/v8 for
others — even when the client never passes --arch, --platform, or
--os. The drift is benign in isolation, but appears to be the trigger
for a downstream COPY --from=builder failure under load (see
"Production failure" below).
Reproduces deterministically (5/5 fresh-build iterations) with two
parallel container build invocations against the same Dockerfile.
No third-party tooling involved.
Environment
- container CLI: 0.12.3 (build: release)
- macOS: Darwin 26.4.1
- Hardware: Apple Silicon Apple M4 Max
- Builder VM: default (2 CPU / 2048 MB; `container builder status`)
Code of Conduct
I have done the following
Steps to reproduce
Reproduction
Dockerfile (
Dockerfile.shared):Script:
The full script and fixture live at
Sample Compose Files/CHAOS-1506-repro/ in
https://github.com/full-chaos/container-compose
(
Dockerfile.shared,repro.sh). The reproduction does not depend oncontainer-compose — that's just where the repro happens to live for
convenience.
Observed signal
Every per-build log shows the platform identifier changing between
stages of the same build. Example from one of build A's logs:
Builder stage:
linux/arm64. Runtime stage of the same build:linux/arm64/v8. The script never passes--archor--platform—container builddefaulted both ends to the host arch but normalizedthem inconsistently.
Reproduced 5/5 fresh-build iterations, every per-build log
(
/tmp/chaos-1506-a-{1..5}.logand/tmp/chaos-1506-b-{1..5}.log—10 total): each contains exactly one
linux/arm64/v8occurrencealongside
linux/arm64ones.Production failure (Linear CHAOS-1506)
The same drift manifested as a hard build failure in a heavier user
workload (Python pip-install multi-stage Dockerfile):
Same arm64/arm64/v8 drift, plus the
COPY --from=builderresolvedto a state that triggered ERROR before the builder stage of the same
build completed. The lightweight repro above shows the drift but does
not reliably fire the COPY failure — likely because the alpine builder
finishes too quickly to widen the race window against runtime's
COPY --from=builderlookup.Hypothesis
The runtime stage's
COPY --from=builderresolves the stage by namevia a platform-keyed lookup. When the lookup uses
arm64/v8but thelayer was registered under
arm64(or vice-versa), one of two thingshappens:
builder stage completes.
entry (same content hash, different platform key) and fails on
partial extraction under contention.
We were unable to confirm the precise mechanism from the client side —
the platform normalization happens inside the buildkit-shim.
Client-side workaround
Users hit by the COPY failure can serialize build dispatches at the
client by running them sequentially (or via a tool that bounds
parallelism to 1). This avoids two concurrent
BuildCommandclientsdialing the same buildkit container, which sidesteps the contention
that turns the platform drift into a hard COPY failure. It does not
fix the underlying normalization inconsistency.
Ask
across stages of one build invocation.
COPY --from=<stage>resolution is platform-keyedin a way that's vulnerable to the drift.
Logs
container-build-shell-repro-build-B-run1.log
container-build-shell-repro-build-A-run1.log
Problem description
container buildemits inconsistent platform identifiers across stagesof one build —
linux/arm64for some stages andlinux/arm64/v8forothers — even when the client never passes
--arch,--platform, or--os. The drift is benign in isolation, but appears to be the triggerfor a downstream
COPY --from=builderfailure under load (see"Production failure" below).
Reproduces deterministically (5/5 fresh-build iterations) with two
parallel
container buildinvocations against the same Dockerfile.No third-party tooling involved.
Environment
Code of Conduct