feat(producer): add shaderTransitionWorkerPool (hf#732 PR 3/5)#758
Conversation
jrusso1020
left a comment
There was a problem hiding this comment.
Verdict: APPROVE — clean standalone pool, structurally parallel to #757
Per Rule 5: required CI all green (Lint, Build, Test, Typecheck, regression, regression-shards, preview-regression, player-perf). Graphite mergeability_check in_progress (normal for stacked PR).
Audited: shaderTransitionWorkerPool.ts end-to-end (~320 lines).
Trusting: the 6 tests (byte-equivalence across all 15 shaders + transferList + lifecycle), shaderTransitionWorker.ts, the build wiring across cli/tsup, producer/build.mjs, and the engine ./shader-transitions subpath export — verified registration but didn't audit internals.
What I checked carefully
Same shape as PR2's pool — same patterns hold here:
-
Lifecycle correctness (
:282-308) —terminate()rejects queued tasks, then in-flight tasks, thenPromise.allworker.terminate with swallowed errors. Idempotent guard at top. -
onWorkerError/onWorkerExit(:230-247) — buffers-lost framing in the rejection message tells the caller their transferred state can't be recovered. Correct. -
transferListcontract documented at:33-37— the most error-prone aspect (caller-must-use-result.*) is in the API JSDoc. -
buildExecArgvmatches the PR2 pool's pattern for tsx-loader injection under vitest. -
Pool size clamping to
[1, cpus().length](:175-176).
One observation worth knowing (non-blocking)
No defensive <8KB Buffer-pool copy here. PR2's pool defensively copies into a dedicated ArrayBuffer if the input Buffer is a slice over the shared 8KB pool. This pool doesn't — it casts .buffer as ArrayBuffer directly. That's fine for the current use case because shader-blend at 16bpc operates on width × height × 6 byte buffers (≥2.4MB for 854×480) — always over the 8KB pool threshold.
If a future caller ever passes shader-blend a smaller buffer (e.g. for a thumbnail-resolution preview blend, or test fixture), postMessage will throw DataCloneError. Worth either:
- (a) Inheriting the defensive copy from PR2's pool for symmetry, OR
- (b) Adding an explicit invariant comment: "This pool requires all input buffers to be
Buffer.alloc()over dedicated ArrayBuffers (>= 8KB). Smaller buffers will fail transferList. Acceptable invariant for the 16bpc-rgb48le use case but breaks if shader-blend ever runs on smaller resolutions."
Defensible either way; (b) is the lower-effort version of "no silent surprise."
Code-DRY observation (also non-blocking)
The two pools (this one + PR2's pngDecodeBlitWorkerPool) share ~95% of the implementation — resolveWorkerEntry, buildExecArgv, dispatch loop, lifecycle, error handling, even the trace-mode env switch. Could be extracted into a generic WorkerPool<TReq, TRes> base in a follow-up. Not gating, but the two-pool duplication will compound if a third pool (e.g. for caption text rasterization) gets added later.
Praise
- Same architectural discipline as PR2 — "standalone" + tested before wired into production. Easy to land + audit independently.
- The "Why a SEPARATE pool" rationale paid off here too: byte-equivalence pinned across all 15 shaders (per test claim) is strong evidence the offloaded blend is identical to the inline path.
- Per-task
traceEnabledis the right primitive for the perf-validation step that lands in PR4/PR5 — being able to confirm concurrency empirically (vs assume from concurrency-control code review) is exactly what perf changes need.
Review by Rames Jusso (pr-review)
miguel-heygen
left a comment
There was a problem hiding this comment.
Quick review additive to Rames.
Structurally parallel to #757 — same pool shape, same patterns, same quality. Two notes:
-
Missing
<8KBBuffer pool defense — Rames flagged this already but framed it as optional. I lean toward (b): add an explicit invariant comment rather than inheriting the defensive copy. The shader pool's buffers are alwayswidth × height × 6(minimum ~2.4MB at 854×480), so the defense would be dead code that obscures the real contract. A comment like// INVARIANT: all input buffers are Buffer.alloc() over dedicated ArrayBuffers (>= 8KB). Smaller buffers will fail transferList.at thepostMessagecall site is sufficient and cheaper to maintain. -
Code-DRY opportunity — also flagged by Rames. The two pools share ~95% of their implementation. A generic
WorkerPool<TReq, TRes>base would be a good follow-up if a third pool materializes. For now the duplication is acceptable — two copies is annoying but manageable; three would warrant extraction.
Build wiring (tsup entry, esbuild entry, subpath export) mirrors #757 exactly. CI green. Ship it.
— Magi
978f510 to
2dac19a
Compare
d209d60 to
4279ba2
Compare
703a6f8 to
ede5cde
Compare
4279ba2 to
20bd055
Compare
20bd055 to
cdeafae
Compare
ede5cde to
92bccfd
Compare
The base branch was changed.
hf#677 follow-up. Adds a `worker_threads`-based pool that runs the shader-transition blend (one of 15 transition shaders) on a fixed-size worker pool. No production wiring yet — the pool stands alone and the hybrid path in PR 4 wires it up. Why a separate pool: the shader blend is a hot inner loop iterated over every pixel of every transition frame at 16bpc. Running it on the main event loop was previously the dominant wall-time bottleneck once DOM capture parallelism was reached. Moving it to `worker_threads` unblocks the main thread for CDP dispatch and pipelined decode/blit. Pieces: * `shaderTransitionWorker.ts` — the worker entry. Imports `TRANSITIONS` + `crossfade` from `@hyperframes/engine/shader-transitions` (the subpath added here on the engine package). Like the alpha-blit subpath, this is a zero-import TS file so it survives the `new Worker(<path>)` loader boundary without dragging in the producer module graph. * `shaderTransitionWorkerPool.ts` — fixed-size pool with `run()` API. Uses `transferList` so the 16bpc HDR `from`/`to`/`out` buffers move by ownership across the boundary instead of being copied. * `shaderTransitionWorkerPool.test.ts` — 6 vitest tests pinning byte- equivalence across all 15 shaders against an inline reference, transferList correctness, and pool lifecycle (concurrent dispatch + termination). All pass. Build wiring: * `packages/cli/tsup.config.ts`: third tsup entry emits `dist/shaderTransitionWorker.js` alongside `dist/cli.js`. Same rationale as the pngDecodeBlitWorker entry in PR 2 — the pool's `new Worker(<path>)` resolver probes for that file next to its loaded module. * `packages/producer/build.mjs`: fourth esbuild entry emits the worker as `dist/services/shaderTransitionWorker.js` for direct producer consumers. Adds the `@hyperframes/engine/shader-transitions` workspace alias to the existing `workspaceAliasPlugin`. * `packages/engine/package.json`: adds `./shader-transitions` subpath export pointing at `src/utils/shaderTransitions.ts`. The file is already import-free so the worker can consume the TS source directly. No behavior change in any render. PR 3 of 5 in the hf#732 decomposition stack; stacked on top of PR 2 (pngDecodeBlit pool). -- Vai Co-Authored-By: Vai <vai@heygen.com>
cdeafae to
525c597
Compare
miguel-heygen
left a comment
There was a problem hiding this comment.
Re-reviewed after Graphite rebase.
This is a clean, well-structured worker pool that mirrors pngDecodeBlitWorkerPool from PR 2/5. Reviewed the full +860 line diff against the sibling pool for API consistency.
Worker lifecycle and cleanup -- Correct. Spawn-time failure path terminates partially-spawned workers. terminate() rejects both queued and in-flight tasks before calling Worker.terminate(), preventing leaked promises. The onWorkerExit handler rejects the current task on unexpected exit without auto-respawning (right call -- buffers are lost). Idempotent terminate guard prevents double-teardown.
Buffer transfer semantics -- Correct. The three rgb48le ArrayBuffers move via transferList (zero-copy), the worker wraps them with Buffer.from(ab) (zero-copy view), and the reply transfers all three back. The error path also transfers buffers back so the caller can release them. Unlike the PNG pool, this pool does not need the 8KB shared-pool safety net because the caller always uses Buffer.alloc() for rgb48le buffers which creates dedicated ArrayBuffers.
Shader dispatch correctness -- TRANSITIONS[shader] ?? crossfade fallback matches the inline path exactly. The test suite pins byte-equivalence across all 15 shaders plus the unknown-shader fallback. Concurrent dispatch test with pool-size=4 and 8 tasks confirms no slot leakage or result misrouting.
API consistency with pngDecodeBlitWorkerPool -- Pool shape is structurally identical: same PoolLogger interface, same WorkerSlot/PendingTask internals, same resolveWorkerEntry 4-step resolution, same buildExecArgv tsx injection, same run/terminate public API, same trace instrumentation pattern (HF_SHADER_POOL_TRACE=1). The only differences are domain-specific (different buffer shapes, different env var names, no 8KB pool safety net). This is appropriate.
Build wiring -- tsup entry, esbuild entry, and subpath export all follow the same pattern as the PNG worker from PR 2/5. The @hyperframes/engine/shader-transitions subpath points at a zero-import TS file, matching the ./alpha-blit precedent.
Test coverage -- 6 tests covering byte-equivalence (single shader + all 15), transfer semantics (detach verification), unknown shader fallback, concurrent dispatch, explicit workerEntryPath, and graceful terminate with queued tasks. The terminate test is race-tolerant (accepts both resolve and reject for the queued task). Solid.
Ship it.
jrusso1020
left a comment
There was a problem hiding this comment.
Re-approved at 525c5975 — carrying forward my prior APPROVE.
Per Rule 3 (additive, not redundant): same diff size (+860/-1 across 6 files) as my prior review at d209d604. Looks like the Graphite-rebase pattern — same as hf#757 re-review I just posted ~15 min ago. Code surfaces are unchanged.
Stack status
- hf#756 (worker-count cap, stack root): merged ✓
- hf#757 (PR 2/5, pngDecodeBlit pool): re-approved at
ede5cde3by me + Magi ~15 min ago. NewworkerEntryPathfactory option + bundled-CLI build wiring landed there. - hf#758 (this PR, PR 3/5, shaderTransition pool): same
978f5103-era code, functionally unchanged.
If the same workerEntryPath-style addition is incoming for this pool in a future commit (parallel to hf#757), that'll need its own pass — but at HEAD nothing has shifted since my prior review.
Prior advisory items at HEAD
Same disposition as before — neither addressed, both still non-blocking:
-
No
<8KBBuffer pool defensive copy (vs PR2's pool which has one). Magi and I converged on option (b): add an explicit invariant comment at thepostMessagesite noting "all input buffers areBuffer.alloc()over dedicated ArrayBuffers (≥ 8KB); smaller buffers will fail transferList." The shader-blend's buffer sizes (width × height × 6≥ 2.4MB at 854×480) make this invariant safe today; a one-line comment makes the invariant load-bearing if someone ever runs shader-blend on smaller resolutions (thumbnail preview, test fixture). -
Code-DRY across the two pools —
pngDecodeBlitWorkerPooland this one share ~95% of the implementation (resolveWorkerEntry,buildExecArgv, dispatch loop, lifecycle, error handling, trace-mode switch). Both reviewers flagged this. Two copies is manageable; if a third pool gets added (Magi mentioned caption rasterization as a candidate), a genericWorkerPool<TReq, TRes>base extraction becomes worth doing.
Neither blocks merge; both are follow-up housekeeping.
CI
Comprehensive green on this SHA: Lint ✓, Format ✓, File size check ✓, Build ✓, Test ✓, Typecheck ✓, Test: runtime contract ✓, CLI smoke (required) ✓, Smoke: global install ✓, regression ✓, preview-regression ✓, Preview parity ✓, player-perf ✓, CodeQL ✓. The 10 regression-shards + 2 windows-latest jobs are in_progress; no reds anywhere.
Carry-forward praise
Praise from my prior review still applies to the unchanged surfaces:
- Architectural discipline: standalone pool, tested before wired into production (PR 4 wires it)
- 6 tests pin byte-equivalence across all 15 shaders — strongest possible evidence that the offloaded blend is identical to the inline path
buildExecArgvtsx-loader-injection pattern matches PR2 (consistent dev/test behavior)transferListcontract documented at both API JSDoc and inline- Pool size clamping
[1, cpus().length]✓ - Per-task
traceEnabledis the right primitive for the perf-validation step that lands in PR 4/5
Verdict
APPROVE re-affirmed. Ready to merge after hf#757 lands (stack ordering). Ship 🚀
Review by Rames Jusso (pr-review)
Merge activity
|
…n renders (hf#732 PR 4/5) (#759) ## Summary PR 4 of 5 in the hf#732 decomposition stack. **This is where the bulk of the shader-transition speedup lives** (`~2×` verified — see Empirical validation below). Spreads per-frame DOM capture work across N DOM worker sessions and offloads the per-pixel shader-blend onto a `worker_threads` pool (the pool added in #758). ### Gating The hybrid path is gated by `shouldUseHybridLayeredPath`: - SDR content only — HDR raw-frame sources are fd-bound to one worker (per-worker `dup(fd)` is out of scope here). - `workerCount >= 2`. - Not every frame inside a transition window. When the gate trips, the hybrid loop spawns `workerCount - 1` extra DOM sessions, allocates per-worker scratch buffers, and partitions the frame range into contiguous slices via `distributeLayeredHybridFrameRanges`. Each worker walks its slice; transitions dispatch through the shader-blend pool (with inline fallback). A frame-reorder buffer fences the encoder. Pool teardown is guaranteed via try/finally on both the success and error paths. ### Structural change (heads-up to reviewers) `captureHdrStage.ts` on main was already 921 lines (over the project's 500-line ceiling). Adding the hybrid path on top would push it past 1100 and the local pre-commit hook refuses to stage files past 500. **PR 4 splits `captureHdrStage.ts` into 5 files**: - `captureHdrStage.ts` (orchestrator + cleanup invariants, 469 lines) - `captureHdrResources.ts` (HDR video extraction + image decode + dim probing) - `captureHdrFrameShared.ts` (gating predicates, partitioning, per-scene capture) - `captureHdrSequentialLoop.ts` (legacy single-session loop) - `captureHdrHybridLoop.ts` (new multi-worker path) No behavior change in any pre-existing code path: the sequential loop is byte-equivalent to the previous inline implementation (both consume `captureSceneIntoBuffer` from the shared module, so behavior parity is enforced structurally rather than by comment-keeping). `renderOrchestrator.ts` is intentionally unchanged — the stage computes its own worker budget via `calculateOptimalWorkers` rather than receiving it through the call signature. ## Stack Stacked on top of #758 (PR 3: shaderTransition pool). ## Test plan - [x] 14 new vitest tests in `captureHdrFrameShared.test.ts` pinning the hybrid gating predicate and the contiguous-chunking partitioner — all pass - [x] Producer typecheck clean - [x] oxlint clean ### Empirical validation Mark Witt fixture (Mac, Apple Silicon, hardware GPU, no beginframe): - Published CLI (pre-stack): 2m 12.2s - Cascade CLI (this stack): 1m 07.7s - **Measured speedup: 1.95× on Mac.** (Earlier "2.22×" wording was a projection from per-component micro-benchmarks; the empirical end-to-end number is 1.95× on the validated fixture.) Linux CI confirmation pending top-of-stack regression run. — Vai

Summary
PR 3 of 5 in the hf#732 decomposition stack. Adds a
worker_threads-based pool that runs the shader-transition blend (one of 15 transition shaders) on a fixed-size worker pool. No production wiring yet — the pool stands alone; PR 4 wires it.The shader blend is a hot inner loop over every pixel of every transition frame at 16bpc. Moving it off the main event loop removes the JS-event-loop ceiling that capped throughput in earlier hf#732 iterations.
New files
packages/producer/src/services/shaderTransitionWorker.ts— worker entry. Imports from@hyperframes/engine/shader-transitions(zero-import TS source).packages/producer/src/services/shaderTransitionWorkerPool.ts— fixed-size pool. UsestransferListso the 16bpc HDRfrom/to/outbuffers move by ownership.packages/producer/src/services/shaderTransitionWorkerPool.test.ts— 6 vitest tests pinning byte-equivalence across all 15 shaders, transferList correctness, pool lifecycle. All pass.Build wiring
packages/cli/tsup.config.ts: third tsup entry emitsdist/shaderTransitionWorker.js.packages/producer/build.mjs: fourth esbuild entry for direct producer consumers.packages/engine/package.json: adds./shader-transitionssubpath export.Stack
Stacked on top of #757 (PR 2: pngDecodeBlit pool). No behavior change in any render.
Test plan
— Vai