Skip to content

Producer 0.6.8+: streaming-encode SIGTERM with audio:0kB for compositions with auto-injected video timing #899

@terencecho

Description

@terencecho

Summary

After bumping @hyperframes/producer from 0.6.70.6.10 in a downstream repo, one of three regression fixtures (style-13) started failing deterministically with Streaming encode failed: FFmpeg exited with code 255. The other two (style-7, style-16) pass with the same bump.

Reproduced across two consecutive CI runs with different runner conditions; not a flake. Looks like it was introduced in 0.6.8 via #832 (perf(engine): faster shader transitions via page-side WebGL compositing), which bundled three coupled changes:

  1. Unconditional data-hf-auto-start sentinel injection in packages/core/src/compiler/timingCompiler.ts (every <video>/<audio> without data-start)
  2. New discoverVideoVisibilityFromTimeline() in packages/producer/src/services/htmlCompiler.ts that overwrites video.start/video.end with opacity-derived windows
  3. New enablePageSideCompositing (default true) bypassing the layered shader-blend path

HF_PAGE_SIDE_COMPOSITING=false does not fix it — the auto-start injection in timingCompiler.ts is unconditional.

Failure mode

{"event":"test_error","suite":"style-13-prod","error":"Streaming encode failed: FFmpeg exited with code 255\nffmpeg stderr (tail):\nvideo:13530kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.020131%\n[libx264 @ ...] frame I:3     Avg QP:12.91  size: 73263\n[libx264 @ ...] frame P:431   Avg QP:14.72  size: 31633\n...\n[libx264 @ ...] kb/s:7661.05\nExiting normally, received signal 15."}

Key signals:

  • libx264 prints its full end-of-encode stats (frame I:3, P:431, kb/s:7661) → encoder finished encoding every frame
  • Exiting normally, received signal 15 → SIGTERM (not an internal crash)
  • audio:0kB is expected for this stage — streamingEncoder.ts is video-only; audio comes via assembleStagemuxVideoWithAudio later
  • Exit code 255 (not 143) — matches streamingEncoder.ts:421-425 safety timeout firing during the post-frame flush

So FFmpeg encoded all 16s / ~480 frames cleanly, then was SIGTERM'd in the flush/teardown window by the 600s ffmpegStreamingTimeout safety timer in streamingEncoder.ts. The wrapper sees non-zero exit and reports "Streaming encode failed".

Why style-13 specifically

Comparing run telemetry:

  • style-13: staticDuration:16, width:1080, height:1920, audioCount:1, videoCount:1 — no Google Fonts fetch
  • style-16: staticDuration:13.88, width:1080, height:1920, audioCount:1, videoCount:1 — fetches Impact
  • style-7: staticDuration:16.7, width:1920, height:1080, audioCount:1, videoCount:1 — 4 font families

Frame-capture calibration p95Ms was actually higher on style-16 (6094ms, multiplier 8) than style-13 (1708-2399ms, multiplier 2.85-4) and style-16 passed — so the regression isn't simple slowness. Something in style-13's composition shape interacts with one of the three changes in #832.

Strongest suspect: <video> element with no explicit data-start → auto-tagged with data-hf-auto-startdiscoverVideoVisibilityFromTimeline() overwrites video.start/video.end with opacity-binary-searched window. If that window is large or the binary-search adds enough wall-clock time to the probe + render pipeline, the 600s ffmpegStreamingTimeout fires during flush.

Downstream impact

heygen-com/hyperframes-internal PR #328 wants to bump from 0.6.70.6.10 specifically to pick up the lottieReadiness + import.meta.env fixes from #861 (so it can drop two local patches). All three of 0.6.8 / 0.6.9 / 0.6.10 include the regression, and 0.6.7 is missing #861 — so there's no version that gives us both.

What would unblock us

Any of:

  1. Make the auto-injected sentinel + discoverVideoVisibilityFromTimeline() opt-in (config flag or env var, default off). Existing fixtures with explicit data-start are unaffected.
  2. Make discoverVideoVisibilityFromTimeline() non-destructive: only override video.start/video.end when the original values came from auto-injection AND the discovered window is strictly larger than ~1 frame, AND falls inside [0, duration].
  3. Diagnose and fix the actual cause; ship as 0.6.11.

Happy to send a PR for (1) or (2) if useful — we have a downstream test that flips green/red on this. Just wanted to file the analysis first since the root cause inside the producer pipeline isn't fully pinned down from the outside.

Repro environment

  • Linux CI runner (Dockerfile.test in heygen-com/hyperframes-internal), in-process render mode
  • @hyperframes/producer@0.6.10 + downstream @app/producer-internal
  • Composition: 1080×1920, 16s, 1 video element, 1 audio element, 30fps target
  • Output: SDR mp4, streaming-encode path enabled (default)

cc anyone touching #832 / discoverVideoVisibilityFromTimeline.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions