perf(producer): native WebGPU shader-blend via Dawn (opt-in, Mac-Metal) by vanceingalls · Pull Request #831 · heygen-com/hyperframes

vanceingalls · 2026-05-14T08:06:35Z

Stacked on #822 — review/merge that first.

EXPERIMENTAL — spike for Vance to pull and test locally on Mac. Draft, do not stamp.

Summary

Adds a Node-side WebGPU compositor for the shader-transition blend, backed by Google's Dawn implementation via the webgpu npm package. Wired into the existing shaderTransitionWorker as an opt-in path. Default OFF — set HF_DAWN_WEBGPU=1 or pass the new --gpu-shader-blend CLI flag to engage.

This is option B from the reference_5x_shader_perf_alternatives.md memo. Sibling spike for option A (page-side compositing) is being drafted in parallel.

Why

The hf#677 chain (PRs #756-#760) gets us 1.95× on Mac via a CPU shader worker pool, ring-buffered pipelining, and a hybrid layered/parallel capture path. The remaining ceiling is the per-pixel JS blend itself — scalar f64 math in v8. On any host with a real GPU (Mac/Metal, Linux/Vulkan, Windows/D3D), moving the blend to the GPU via Dawn should drop blend wall-time on a 854×480 rgb48le frame from ~150–910 ms (depending on shader complexity) to a few ms.

HeadlessExperimental.beginFrame is structurally Mac-unavailable (crbug.com/40656275); native WebGPU is the next-best Mac-viable lever — see #787 / the reference_beginframe_mac_structural_limit.md memo.

Dawn package choice

webgpu@0.4.0 (dawn-gpu/node-webgpu, maintained by Dawn upstream — Corentin Wallez, Kai Ninomiya, etc.). Picked because:

It ships a darwin-universal.dawn.node prebuilt that targets Apple Metal directly (no source build, no rosetta).
Linux x64/arm64 Vulkan + Windows x64 D3D prebuilts ship in the same package.
Maintained by Dawn upstream — the canonical Node binding.
Verified locally that the package loads on Linux x64; adapter init returns null in the sandbox (no Vulkan driver) and the worker transparently falls back to the existing CPU path.

The earlier 2026-05-12 Dawn spike concluded "no GPU available" on the Linux sandbox; this PR is the re-spike with Mac as the primary target.

Flag

--gpu-shader-blend on the CLI, or HF_DAWN_WEBGPU=1 in the environment. There's also HF_DAWN_FORCE_FAIL=1 for testing the fallback engages cleanly.

What changed

packages/producer/src/services/shaderTransitionGpu.ts (NEW) — compositor: dynamic-imports webgpu, requests adapter + device, compiles WGSL per supported transition, manages per-size GPU resources, dispatches blends, reads back to rgb48le. Returns structured init failures rather than throwing.
packages/producer/src/services/shaderTransitionWorker.ts — augmented: first message under HF_DAWN_WEBGPU=1 triggers a one-time GPU probe; supported shaders run on the GPU thereafter; everything else falls through to the existing TRANSITIONS[shader] ?? crossfade CPU path with zero behavior change. Mid-render GPU failure disables the GPU path and falls back.
packages/cli/src/commands/render.ts — --gpu-shader-blend flag sets the env var before pool spawn.
packages/producer/build.mjs + packages/cli/tsup.config.ts — webgpu marked external (70+ MB native binary, must not be bundled).
packages/producer/package.json + packages/cli/package.json — webgpu@^0.4.0 as optionalDependencies (native binary; feature is opt-in).
packages/producer/src/services/shaderTransitionGpu.test.ts (NEW) — 3 vitest tests: HF_DAWN_FORCE_FAIL short-circuit, init never throws, PSNR-vs-CPU ≥ 50dB when an adapter is available (auto-skipped on no-GPU hosts).

Wall-time pair (Linux sandbox, no GPU — fallback path only)

Bundled CLI worker smoke (854×480 single crossfade blend, harness at /tmp/validate-gpu-worker.mjs):

config	wall	notes
CPU baseline (`HF_DAWN_WEBGPU` unset)	67 ms	reference
`HF_DAWN_WEBGPU=1` (no GPU, fell back)	70 ms	+3 ms init/probe, then CPU
`HF_DAWN_WEBGPU=1` + `HF_DAWN_FORCE_FAIL=1`	66 ms	fallback verified

Both fallback runs produced byte-identical output to the CPU baseline checksum — fallback fidelity confirmed.

Mac numbers (TODO — Vance)

Pull this branch, run a shader-transition fixture (e.g. Mark Witt's 14-transition fixture you used for #759/#760) twice:

# Baseline (1.95× post-cascade)
node packages/cli/dist/cli.js render <fixture> -o /tmp/baseline.mp4 --workers 6
# With Dawn
node packages/cli/dist/cli.js render <fixture> -o /tmp/dawn.mp4 --workers 6 --gpu-shader-blend

Drop the wall-time pair in a comment + look for the GPU compositor active log line.

Determinism trade

GPU compute uses f32 + u16 storage; CPU canonical uses f64. Bit-exact equality is not realistic.

The default-off path keeps CI byte-equality pins intact — existing fixtures hit the CPU path unchanged.
The fallback CPU path is bit-exact with the canonical CPU implementation (no drift on no-GPU hosts).
Fixtures that exercise the GPU path must use a PSNR pin (≥ 50 dB) — documented in the new test.

Fallback behavior

Anywhere the GPU path can't run, the existing CPU path runs unchanged:

webgpu package not installed → CPU
webgpu loads but requestAdapter() returns null → CPU
WGSL pipeline compile fails → CPU
--gpu-shader-blend set but shader not in SHADERS_WGSL → CPU (per-frame)
Mid-render GPU failure → log + disable GPU + finish on CPU

Failure reason logged once per worker (not per frame). No crash path.

Coverage in this spike

One representative shader (crossfade) ported to WGSL end-to-end as proof of correctness. Other shaders transparently fall through to CPU even when the flag is on. Porting the remaining 12 is a mechanical add to SHADERS_WGSL in shaderTransitionGpu.ts.

Follow-ups

Port the remaining shaders (flashThroughWhite, chromaticSplit, sdfIris, glitch, lightLeak, crossWarpMorph, whipPan, cinematicZoom, gravitationalLens, rippleWaves, swirlVortex, thermalDistortion, domainWarp, ridgedBurn).
After Mac wall-time confirms the lever: decide flip-default-on vs keep opt-in.
PSNR-only fixture pins for any CI fixture intended to exercise the GPU path.
Persistent device + pipeline cache shared across workers (currently per-worker init).

Test plan

Producer typecheck clean
CLI typecheck clean
oxlint + oxfmt clean
shaderTransitionGpu.test.ts — 3 tests pass (force-fail short-circuit + no-throw init + PSNR skip-when-no-GPU)
shaderTransitionWorkerPool.test.ts — all 7 existing pool tests still pass (CPU path byte-equivalence preserved)
Bundled CLI worker smoke (CPU / GPU-flag-fallback / force-fail) confirms fallback fidelity + clean logging
Vance: Mac wall-time pair on a shader-transition fixture

PR drafted by Vai

EXPERIMENTAL spike. Adds a Node-side WebGPU compositor backed by the `webgpu` npm package (Dawn) and wires it into the existing shader-transition worker as an opt-in alternative to the CPU blend. Default OFF — set `HF_DAWN_WEBGPU=1` or pass `--gpu-shader-blend` to the CLI to engage. The hf#677 chain (PRs #756-#760) gets us 1.95x on Mac via a CPU shader worker pool, ring-buffered pipelining, and a hybrid layered/parallel capture path. The remaining ceiling is the per-pixel JS blend itself — scalar f64 math in v8. On any host with a real GPU (Mac/Metal, Linux/Vulkan, Windows/D3D), moving the blend to the GPU via Dawn should drop blend wall time on a 854x480 rgb48le frame from ~150-910 ms (depending on shader complexity) to a few ms, projecting 3-5x end-to-end on top of the existing cascade — see the `reference_5x_shader_perf_alternatives.md` memo (option B). beginFrame is structurally Mac-unavailable (crbug.com/40656275); native WebGPU is the next-best Mac-viable lever. The Dawn binding (`webgpu@0.4.0`, dawn-gpu/node-webgpu, maintained by Dawn upstream — Corentin Wallez, Kai Ninomiya) ships a `darwin-universal` prebuilt that targets Apple Metal directly, plus Linux x64/arm64 Vulkan and Windows x64 D3D. Verified locally that the package loads on Linux x64; adapter init returns null in the sandbox (no Vulkan driver) and the worker transparently falls back to the existing CPU path. Mac numbers must be measured on Vance's laptop — there's no GPU host in CI. - `packages/producer/src/services/shaderTransitionGpu.ts` (NEW) Node-side compositor: dynamic-imports `webgpu`, requests an adapter + device, compiles a WGSL compute shader per supported transition, manages per-size GPU resources, dispatches blends, reads back to `rgb48le`. Returns structured init failures rather than throwing — the caller can always fall back to CPU. `HF_DAWN_FORCE_FAIL=1` short-circuits init for testability of the fallback path. - `packages/producer/src/services/shaderTransitionWorker.ts` Augmented: on the first message, if `HF_DAWN_WEBGPU=1`, dynamic-imports the GPU module and probes once. On success, supported shaders run on the GPU. Unsupported shaders (`glitch`, `domain-warp`, `swirl-vortex`, the rest) and any host without a GPU adapter fall through to the existing `TRANSITIONS[shader] ?? crossfade` CPU path with zero behavior change. Mid-render GPU failure disables the GPU path for the rest of the worker's life and falls back — the frame still completes. - `packages/cli/src/commands/render.ts` New `--gpu-shader-blend` boolean flag (default false). When set, the CLI exports `HF_DAWN_WEBGPU=1` into `process.env` before any worker pool spawns. Env-var plumbing chosen over threading the flag through the orchestrator -> stage -> pool -> worker chain because env vars cross the `worker_threads` boundary unchanged. - `packages/producer/build.mjs` + `packages/cli/tsup.config.ts` `webgpu` added to the `external` list in both bundlers. The Dawn binding ships a 70+ MB native `.dawn.node` binary per platform — it must be loaded from the user's node_modules at runtime, not inlined into the bundle. - `packages/producer/package.json` + `packages/cli/package.json` `webgpu@^0.4.0` added to `optionalDependencies` on both. Optional because (a) it's a native binary that may fail to install on some hosts and (b) the feature is opt-in. - `packages/producer/src/services/shaderTransitionGpu.test.ts` (NEW) 3 vitest tests: `HF_DAWN_FORCE_FAIL` short-circuit, init never throws, and PSNR-vs-CPU >= 50 dB on hosts where the adapter is available (auto-skipped on Linux sandbox with a log line). One representative shader (`crossfade`) is ported to WGSL end-to-end as proof of correctness. The harness is shader-agnostic; porting more is a mechanical add to `SHADERS_WGSL` in `shaderTransitionGpu.ts`. Unsupported shaders transparently fall through to CPU even when the flag is on. GPU compute uses f32 + u16 storage; CPU canonical uses f64. Bit-exact equality is not realistic. The default-off path keeps CI byte-equality pins intact (the existing fixtures hit the CPU path unchanged). Fixtures that exercise the GPU path must use a PSNR pin (>= 50 dB documented in the new test). The fallback CPU path is bit-exact with the canonical CPU implementation. Bundled CLI worker smoke (854x480 single crossfade blend, see the investigation log for the harness): | config | wall | notes | |---|---:|---| | CPU baseline (`HF_DAWN_WEBGPU` unset) | 67 ms | reference | | `HF_DAWN_WEBGPU=1` (fell back, no GPU) | 70 ms | +3 ms init+probe, then CPU | | `HF_DAWN_WEBGPU=1` + `HF_DAWN_FORCE_FAIL=1` | 66 ms | fallback verified | Both fallback runs produced byte-identical output to the CPU baseline checksum — fallback fidelity confirmed. **Mac numbers: Vance, please fill in.** Pull the branch, run a shader-transition fixture both with and without `--gpu-shader-blend`, and drop the wall-time pair in a comment. Anywhere the GPU path can't run, the existing CPU path runs unchanged: - `webgpu` package not installed (optional dep skipped on install) -> CPU - `webgpu` loads but `requestAdapter()` returns null (no GPU) -> CPU - WGSL pipeline compile fails -> CPU - `--gpu-shader-blend` set but shader not in `SHADERS_WGSL` -> CPU - Mid-render GPU failure -> log + disable GPU + finish on CPU Failure reason is logged once per worker (not per frame). No crash path. - Port the remaining 12 shaders to WGSL (mechanical: `flashThroughWhite`, `chromaticSplit`, `sdfIris`, `glitch`, `lightLeak`, `crossWarpMorph`, `whipPan`, `cinematicZoom`, `gravitationalLens`, `rippleWaves`, `swirlVortex`, `thermalDistortion`, `domainWarp`, `ridgedBurn`). - Once Mac wall-time confirms the lever, decide on flip-default-on or keep opt-in. - Update fixture pins for any fixture that needs to exercise the GPU path under CI (PSNR-only). - A persistent device + pipeline cache shared across workers (currently per-worker init). _PR drafted by Vai_ Co-Authored-By: Vai <vai@heygen.com>

vanceingalls force-pushed the vai/perf-dawn-webgpu-compositor branch from 58a976b to 450bb01 Compare May 14, 2026 15:57

vanceingalls changed the base branch from main to vai/engine-software-renderer-guard May 14, 2026 15:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(producer): native WebGPU shader-blend via Dawn (opt-in, Mac-Metal)#831

perf(producer): native WebGPU shader-blend via Dawn (opt-in, Mac-Metal)#831
vanceingalls wants to merge 1 commit into
vai/engine-software-renderer-guardfrom
vai/perf-dawn-webgpu-compositor

vanceingalls commented May 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vanceingalls commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Dawn package choice

Flag

What changed

Wall-time pair (Linux sandbox, no GPU — fallback path only)

Mac numbers (TODO — Vance)

Determinism trade

Fallback behavior

Coverage in this spike

Follow-ups

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vanceingalls commented May 14, 2026 •

edited

Loading