Skip to content

Commit e41f5e5

Browse files
0x1abinclaude
andauthored
chore: sync develop with main + refresh nano_rtc_cache_fingerprint refs (#62)
* feat(video): introduce NANORTC_VIDEO_PKT_RING_SIZE macro + pkt_ring_tail cursor Decouple the video NACK retransmit ring from NANORTC_OUT_QUEUE_SIZE so IoT targets can shrink the NACK window without starving the output dispatch queue. Default equals NANORTC_OUT_QUEUE_SIZE — fully behavior-preserving. - nanortc_config.h: new macro with ifndef guard, ESP-IDF Kconfig hook, power-of-two + min-4 validation. Caller-drain invariant documented. - nanortc.h: resize pkt_ring[] and pkt_ring_meta[] to the new macro, add pkt_ring_tail field (memset-zeroed by nanortc_init). Write-path and NACK-scan updates land in the follow-up commit. Part of Phase 8 PR-3 (docs/exec-plans/active/phase8-continued-optimization.md). * feat(video): wire pkt_ring_tail cursor + honor PKT_RING_SIZE in write + NACK paths - nano_rtc.c video_send_fragment_cb: derive slot from pkt_ring_tail (mask NANORTC_VIDEO_PKT_RING_SIZE - 1) instead of out_tail, advance the cursor after stamping meta. out_queue slot selection inside rtc_enqueue_transmit is unchanged. - nano_rtc.c NACK retransmit scan: iterate over NANORTC_VIDEO_PKT_RING_SIZE so scan bounds track pkt_ring actual size (not the unrelated output queue). - Kconfig: expose CONFIG_NANORTC_VIDEO_PKT_RING_SIZE, defaulting to the existing OUT_QUEUE_SIZE value for backward-compat. Verified: default build (PKT_RING=OUT_QUEUE=32) and override build (-DNANORTC_VIDEO_PKT_RING_SIZE=16 with OUT_QUEUE=32) both build clean and pass 21/21 host tests. Part of Phase 8 PR-3 (docs/exec-plans/active/phase8-continued-optimization.md). * test(video): regression for pkt_ring decoupling + document PKT_RING_SIZE knob Adds three focused unit tests in test_media.c (under #if NANORTC_FEATURE_VIDEO) that mirror the nano_rtc.c write-path / NACK-scan idioms against a zeroed nanortc_t: - in-window lookup: recently-stamped seqs are findable. - out-of-window miss: after 2× ring wrap, old seqs are gone without crash. - independence from out_tail: pkt_ring_tail advances alone; out_tail stays at whatever the caller set; slot 0 holds the newest wrap's seq. Tests pass on both default build (PKT_RING=OUT_QUEUE=32) and IoT override (-DNANORTC_VIDEO_PKT_RING_SIZE=16 with OUT_QUEUE=32), which exercises the wraparound path impossible to hit at the default. Docs: add a "Video NACK ring — tunable independently of the output queue" section to memory-profiles.md covering the macro, example override, safety invariant (Sans-I/O caller-drain contract), and expected ~19 KB saving at PKT_RING_SIZE=16. Tweak the Biggest Contributors and Kconfig tables to point at the new macro. Part of Phase 8 PR-3 (docs/exec-plans/active/phase8-continued-optimization.md). * docs(video): fix inaccurate ~19 KB claim in PKT_RING_SIZE guidance Copilot review on PR #54 flagged that the "override to 8 saves ~19 KB" claim in the Kconfig help text is wrong for ESP-IDF defaults — with OUT_QUEUE_SIZE=16 (video) the 16→8 drop at MEDIA_BUF_SIZE=1232 only saves ~9.6 KB. The ~19 KB figure matches the host-default 32→16 case. Rewrite the Kconfig help text around the formula (OUT_QUEUE_SIZE - PKT_RING_SIZE) × MEDIA_BUF_SIZE so it stays accurate across host and ESP-IDF configurations, with a Kconfig-correct 9.6 KB worked example. memory-profiles.md already targets host overrides, but extend its inline comment to spell out both host (32→16 ≈19 KB) and Kconfig (16→8 ≈9.6 KB) numbers so readers in either context pick the right one. Addresses #54 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(video): detect pkt_ring overrun + bump stats counter Address PR #54 review concern: when NANORTC_VIDEO_PKT_RING_SIZE is set below the per-frame fragment count, video_send_fragment_cb silently wraps pkt_ring while out_queue[].transmit.data still references the stale slots — the receiver gets newer fragment bytes carrying the older fragment's RTP seq, manifesting as glitched IDRs. Sans-I/O drain doesn't save you because nanortc_send_video() emits every FU-A fragment of one access unit before returning to the caller. Add a runtime guard: when (out_tail - out_head) >= PKT_RING_SIZE, the ring would wrap into a slot whose pointer is still pending. Bump nanortc_t.stats_pkt_ring_overrun (atomic, matches existing stats_* fields) and emit a single static-string NANORTC_LOGW so under-sizing shows up in integration smoke tests rather than as garbled keyframes on the wire. Counter is exported on the struct for app glue to inspect. Cannot prevent the corruption from this layer (the slot is already about to be written), but turning silent-bad-video into a debuggable signal matches the LOG-injection-for-AI-debug use case the rest of the library follows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(video): aliasing regression for pkt_ring overrun detection Mirror video_send_fragment_cb's write path (pkt_ring slot + meta stamp + out_queue pointer enqueue) in test_media.c so the test exercises the actual pointer-aliasing surface, not just the NACK-scan ring semantics. Two new tests: - test_pkt_ring_overrun_counter_fires_when_undersized: pushes PKT_RING_SIZE+3 fragments without draining out_queue, asserts the counter trips exactly 3 times — guards the contract enforced in nano_rtc.c. - test_pkt_ring_aliasing_corruption_demonstrated_without_guard: documents *why* the guard exists. Without draining, out_queue[0]'s .data pointer still addresses pkt_ring[0], which has since been overwritten by the wrapped fragment. Asserts the byte at .data[0] is the latest writer's marker, not the original. This is the silent-corruption pattern the runtime guard surfaces. Both pass under default config (PKT_RING=OUT_QUEUE=32, no overrun because out_queue fills first) and under the IoT override (-DNANORTC_VIDEO_PKT_RING_SIZE=8 -DNANORTC_OUT_QUEUE_SIZE=32, where the overrun path actually fires). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(video): tighten PKT_RING_SIZE sizing guidance to per-frame fragment bound Replace the "drain each tick" framing in Kconfig + memory-profiles.md with the precise hard constraint: PKT_RING_SIZE >= ceil(max_frame_bytes / NANORTC_VIDEO_MTU) + 1 The previous wording implied the Sans-I/O drain contract was sufficient, but nanortc_send_video() emits every FU-A fragment of one access unit before returning to the caller — there is no opportunity to drain mid-frame. Sizing PKT_RING_SIZE below the per-frame fragment count silently corrupts every IDR (older fragment buffers are overwritten while their pointers are still queued in out_queue). Add a worked-numbers table for 480p/720p/1080p H.264 at MTU 1200, and soften the "500 ms NACK history at 30 fps" claim — that figure only held for single-NAL frames; multi-NAL streams cover much less wall- clock time per slot. Cross-reference the new stats_pkt_ring_overrun counter so integrators can wire it into smoke tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(phase8): mark PR-3 completed, record review-driven design change Close out PR-3 in the Phase 8 exec plan. Key corrections vs the original draft: - The draft step 3 recommended "PKT_RING=16 at 30 fps = ~500 ms NACK window" as the IoT sizing rule. That claim only held for single-NAL frames — on 720p+ streams with multi-NAL IDRs it silently corrupted every keyframe because the ring would wrap while out_queue[].data pointers still referenced the overwritten slots. Replaced with the real hard constraint: PKT_RING_SIZE >= ceil(max_frame_bytes / NANORTC_VIDEO_MTU) + 1 plus a worked-numbers table (480p ≈ 16, 720p ≈ 32, 1080p ≈ 64), landing in memory-profiles.md. - Added a runtime overrun guard + stats_pkt_ring_overrun counter so under-sizing surfaces in integration smoke tests rather than as glitched IDRs on the wire. - Grew the test count from the planned single test_nack_ring to five tests in test_media.c, including an aliasing-corruption witness that fails without the guard. Also update docs/PLANS.md line 16 from "Planning" to track PR-3 and PR-5 as landed so the phase's next-action surface is accurate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(video): clarify pkt_ring sizing rule + overrun counter wording Address Copilot review on PR #57: - nanortc_config.h: replace the misleading "drain each tick" framing on NANORTC_VIDEO_PKT_RING_SIZE with the actual hard sizing rule (ceil(max_frame_bytes / VIDEO_MTU) + 1). nanortc_send_video() emits every FU-A fragment of an access unit before returning, so callers cannot drain mid-frame; sizing must be based on worst-case fragments-per-frame, and savings depend on OUT_QUEUE_SIZE rather than a fixed ~19 KB number. - Kconfig: rephrase "Set the runtime overrun counter" to "Monitor stats_pkt_ring_overrun" — the counter is incremented internally and is only readable by integrators, not user-settable. - tests/test_media.c: rename test_pkt_ring_aliasing_corruption_demonstrated_without_guard to test_pkt_ring_aliasing_corrupts_pending_pointers_when_undersized. The previous name implied the helper bypassed the guard, but the test uses pkt_ring_simulate_send_with_guard() and the guard does fire — what the test actually demonstrates is that the runtime guard *cannot prevent* corruption once PKT_RING_SIZE is undersized; it only surfaces the problem via stats_pkt_ring_overrun. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(video): extract pkt_ring_alloc_slot + commit_slot helpers The pkt_ring overrun-guard + slot-allocation pattern, plus the matching NACK metadata stamp and pkt_ring_tail advance, were duplicated between the H.264 zero-copy loop in rtc_send_video (added by PR #56 / fbb5b4f) and the H.265 callback video_send_fragment_cb (added by PR #54 / c1cb699). The verbose comment block explaining the aliasing-corruption invariant lived only at the H.265 site, leaving the H.264 site under-documented. Extract: static uint8_t *pkt_ring_alloc_slot(nanortc_t *, uint16_t *out_pslot) — runs the (out_tail - out_head) >= PKT_RING_SIZE check, bumps stats_pkt_ring_overrun + emits NANORTC_LOGW on overrun, returns rtc->pkt_ring[pslot]. static void pkt_ring_commit_slot(nanortc_t *, uint16_t pslot, uint16_t wire_seq, uint16_t srtp_len) — stamps pkt_ring_meta and increments pkt_ring_tail. Both helpers live at the top of the #if NANORTC_FEATURE_VIDEO block, ahead of the H.265 nesting, so both writers see the same definition. The aliasing-corruption rationale now lives once at pkt_ring_alloc_slot where both callers can find it. Behavior identical: same guard logic, same overrun counter increment, same slot index derivation, same meta layout. Verified across the default + ASan + ICE_SRFLX-off + H.265 feature combos (21 / 21 host tests, 23 / 23 with H.265 enabled, 15 / 15 ci-check.sh --fast). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(phase8): mark PR-4 completed and refresh PR-3 record - docs/PLANS.md line 16: Phase 8 status now lists PR-4 as landed 2026-04-25 (PR #56 / fbb5b4f), matches PR-3 / PR-5 entries. - docs/exec-plans/active/phase8-continued-optimization.md: - Tag PR-4 as [COMPLETED 2026-04-25] in both the summary table and the section header. - Replace the speculative "Approach" bullets with the landed approach (Option C — fragment iterator), measured savings, test additions, and verification record. - PR-3 step 5: note the runtime overrun guard was helper-extracted into pkt_ring_alloc_slot in PR #57 and is now shared between the H.264 and H.265 writers. - PR-3 step 6: rename aliasing_corruption_demonstrated_without_guard to aliasing_corrupts_pending_pointers_when_undersized to match the test rename in ba32abf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sctp): emit DISCONNECTED on retransmit exhaustion nsctp_handle_timeout() transitions SCTP to CLOSED after exceeding NANORTC_SCTP_MAX_RETRANSMITS but never notified the upper layer — applications only learned about the loss on the next nanortc_datachannel_send() failure, a silent stall from the user's point of view. Phase 8 PR-2. Wire the signal through: - nano_sctp.h: new bool closed_due_to_failure on nano_sctp_t, set exactly once when retransmit exhaustion trips the CLOSED transition. Init-time CLOSED (line 480) and peer-ABORT-induced CLOSED (line 982) deliberately do NOT set this — they have separate signaling channels; rationale recorded inline. - nano_sctp.c: set the flag alongside the existing state=CLOSED in the retransmit-exhaustion branch. - nano_rtc.c: in rtc_process_timers, after the SCTP timer block, read and clear the flag. If the RTC was previously NANORTC_STATE_CONNECTED, emit NANORTC_EV_DISCONNECTED and transition to NANORTC_STATE_CLOSED. Mirrors the existing ICE-failure / consent-expiry paths. Critical detail: emit block lives outside the if (sctp.state == ESTABLISHED) guard — once the timeout flips state to CLOSED, that guard goes false on the next tick and the signal would otherwise be lost. - tests/test_sctp.c: * test_sctp_timeout_sets_closed_flag (under #if NANORTC_FEATURE_DC_RELIABLE) walks the full RTO ramp (1000 → 2000 → 4000 → 8000 → 10000 capped) through MAX_RETRANSMITS cycles of nsctp_handle_timeout + nsctp_poll_output, asserts CLOSED + flag on the threshold-trip iteration. * test_sctp_abort_does_not_set_failure_flag — scope guard ensuring peer ABORT does not regress into the failure flag. Verified across DC_RELIABLE=ON (test_sctp 56/56), DC_RELIABLE=OFF (test_sctp 55/55, Test 1 correctly skipped), and scripts/ci-check.sh --fast (15/15 incl. ASan + ICE_SRFLX=OFF + format). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(phase8): mark PR-2 completed - docs/PLANS.md line 16: Phase 8 status now lists PR-2 (SCTP failure event propagation) as landed 2026-04-25, alongside the existing PR-3 / PR-4 / PR-5 entries; only PR-1 (IoT memory profile) remains pending in the recommended-order set. - docs/exec-plans/active/phase8-continued-optimization.md: - Tag PR-2 as [COMPLETED 2026-04-25] in the section header. - Replace the speculative "Approach" bullets with the landed implementation: closed_due_to_failure flag, RTC emit block placed outside the ESTABLISHED guard (the non-obvious detail), the two new regression tests, and verification record (test_sctp 56/56 with DC_RELIABLE=ON, 55/55 with DC_RELIABLE=OFF, ci-check.sh --fast 15/15). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ci): expand fuzz job to fuzz_h265 + fuzz_turn Both harnesses are already registered in tests/fuzz/CMakeLists.txt (fuzz_turn in FUZZ_CORE_TARGETS, fuzz_h265 under FUZZ_H265_TARGETS, gated by NANORTC_FEATURE_H265). The CI fuzz job already configures H265=ON, so adding the two harnesses to the run loop is the only remaining step. Closes review §4 P1-1 (docs/reviews/2026-04-26-overall-review.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(interop): add nanortc-as-TURN-client relay job Spins up the existing tests/interop/turn-server/docker-compose.yml coturn (via scripts/start-test-turn.sh) and runs ctest -L turn-relay against test_interop_turn_relay_nanortc.c (5 cases). Phase 5.2 F6-F10 regressions are now caught in CI; previously they only had hand verification on a downstream camera SDK. Updates docs/engineering/turn-rfc-compliance.md to reflect the new coverage status (the 'no automated test' note was stale once the 744-LOC harness landed). Closes review §4 P1-2 + P1-5 (docs/reviews/2026-04-26-overall-review.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rtc): defer DTLS_HANDSHAKING state until dtls_start succeeds rtc_begin_dtls_handshake() previously set rtc->state to DTLS_HANDSHAKING before calling dtls_start(). On dtls_start failure the function returned early but state was already advanced — leaving a half-initialised DTLS context that subsequent handle_input() bytes would drive down the DTLS path. Reorder so state is only advanced after dtls_start() succeeds; failure leaves state at ICE_CONNECTED so the caller can retry cleanly. Failure-path mock test deferred until a mock crypto provider lands (no current way to make dtls_init / dtls_start fail in tests). Closes review §4 P1-4 (docs/reviews/2026-04-26-overall-review.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(rtc): tear down DTLS context on ice_restart Without explicit dtls_destroy() in nanortc_ice_restart(), the "if (!rtc->dtls.crypto_ctx)" guard in rtc_begin_dtls_handshake would skip dtls_init on the next handshake and reuse the previous DTLS context — possible key-material reuse / orphan handshake timers. dtls_destroy is a no-op when crypto_ctx is NULL, so calling it unconditionally is safe even before any DTLS init. Adds two regression tests: - test_api_ice_restart_clears_dtls: dtls_init -> ice_restart asserts crypto_ctx is NULL and dtls.state is CLOSED post-restart. - test_api_ice_restart_no_dtls_safe: ice_restart without prior dtls_init does not crash. Closes review §4 P2-3 (docs/reviews/2026-04-26-overall-review.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(crc): add RFC 1952 / 3309 / 3720 vector tests The CRC modules (nano_crc32.c / nano_crc32c.c) carried 100% line coverage indirectly via fuzz_stun (FINGERPRINT verify) and fuzz_sctp (chunk checksum), but had no direct algorithm-vector test. An accidental polynomial / initial-value / final-XOR change could pass indirect fuzz round-trip while still being algorithmically wrong. Adds 13 cases pinning canonical RFC vectors: - CRC-32 ISO HDLC (RFC 1952): empty, "123456789" (0xCBF43926), "a" (0xE8B7BE43), "abc" (0x352441C2) - CRC-32c Castagnoli (RFC 3309 / RFC 3720 §B.4): empty, "123456789" (0xE3069283), 32x0x00 (0x8A9136AA), 32x0xFF (0x62A8AB43), incrementing/decrementing 32-byte buffers - Incremental API: segmented == one-shot (the 3-segment pattern used by nsctp_verify_checksum), zero-length update tolerance, byte-at-a-time fragmentation Closes review §4 P3-1 (docs/reviews/2026-04-26-overall-review.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): gate test_crc32 on DATACHANNEL test_crc32.c calls nano_crc32c{,_init,_update,_final}, which live in src/nano_crc32c.c and are only compiled when NANORTC_FEATURE_DATACHANNEL is ON. Without gating, AUDIO_ONLY / MEDIA_ONLY / CORE_ONLY builds and the architecture-constraints DC-OFF check fail to link. Add test_crc32 to the DC_TESTS list so the existing continue() guard skips it in DC-OFF configs, mirroring test_datachannel. The 13 CRC assertions still run in DATA / AUDIO / MEDIA / MEDIA_H265 — RFC vectors are configuration-invariant, so no coverage is lost. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(esp-idf): add compile-only build workflow for examples Builds esp32_datachannel across esp32 / esp32s3 / esp32p4 and esp32_audio + esp32_video on esp32s3 via espressif/esp-idf-ci-action on push/PR to main. Catches component / Kconfig / lwIP regressions without requiring hardware; on-device Python E2E remains manual. esp32_camera is intentionally excluded — its board_manager autogen + camera-sensor selection is board-specific and will land separately. * fix(esp32_video): pin default flash size to 4MB in sdkconfig.defaults The custom partition table reserves a 0x3a0000 (~3.6 MB) factory app, ending at 0x3b0000. ESP-IDF's default flash size is 2 MB, so a clean build from sdkconfig.defaults fails partition-table generation with: Partitions tables occupies 3.7MB of flash (3866624 bytes) which does not fit in configured flash size 2MB. The locally-tracked sdkconfig was set to 16 MB, which masked this in manual builds; the new ESP-IDF CI workflow starts from defaults and caught the regression. Set CONFIG_ESPTOOLPY_FLASHSIZE_4MB=y as the minimum size that fits the partitions; boards with larger flash override via menuconfig. Verified: gen_esp32part.py with --flash-size 4MB succeeds; with 2MB reproduces the CI error exactly. * ci(esp-idf): fetch submodules in checkout for esp32_video sample data esp32_video's main/CMakeLists.txt packs frames from examples/sample_data/h264SampleFrames and opusSampleFrames into video.blob / audio.blob at build time. examples/sample_data is a git submodule (0x1abin/media_sample_data); without `submodules: recursive`, actions/checkout@v4 leaves the path empty and the build fails: Error: examples/sample_data/h264SampleFrames is not a directory Error: examples/sample_data/opusSampleFrames is not a directory The other matrix targets don't reference sample_data, but enabling submodules at the workflow level keeps the step uniform; the submodule is ~15 MB so the cost is negligible. * fix(rtc): clear cached fingerprints on ice_restart dtls_destroy() drops the DTLS context but leaves rtc->sdp.local_fingerprint and rtc->dtls.local_fingerprint populated. rtc_cache_fingerprint() is a write-once cache (early-returns when sdp.local_fingerprint is non-empty), so the next create_offer / accept_offer after an ICE restart would re-advertise the *previous* cert's SDP fingerprint while a fresh dtls_init() generates a new cert and fingerprint. The peer then rejects DTLS handshake with a fingerprint mismatch. Memset both fingerprints to '\0' immediately after dtls_destroy() so rtc_cache_fingerprint() repopulates them from the next dtls_init's cert. Test: extend test_api_ice_restart_clears_dtls (T20) — pre-populate both fingerprint buffers before calling nanortc_ice_restart and assert they are cleared on return. Reported by Copilot review on PR #59. * docs(rtc): correct rtc_begin_dtls_handshake state-failure comment The previous comment claimed that keeping state at ICE_CONNECTED after a dtls_start() failure prevents "a half-initialised DTLS context being driven by inbound bytes". That is misleading: the inbound DTLS gate in handle_input is `state >= NANORTC_STATE_ICE_CONNECTED`, so DTLS records are still accepted on the retry path. The state ordering only governs the FSM; the actual receive-path policy is unchanged. Reword to describe what the reorder actually achieves (FSM stays at ICE_CONNECTED so callers can retry DTLS startup) and explicitly call out that the inbound gate is unchanged, so future readers don't infer a stronger guarantee than exists. Reported by Copilot review on PR #59. * test(rtc): add e2e regression tests for ice_restart hardening Brackets the unit-level T20/T21 coverage in tests/test_trickle_ice.c with a full ICE -> DTLS round trip on both sides. Existing unit tests probe nanortc_ice_restart() against a hand-built nanortc_t and assert which fields got reset; nothing today drives the end-to-end path of two peers reaching DTLS_CONNECTED, restarting, and re-handshaking with fresh keying material. - e2e_resync_ice_creds() helper: counterpart to e2e_setup_ice_creds() that re-syncs ufrag/pwd and re-arms remote_candidates after both peers have called nanortc_ice_restart(). ice_restart() preserves local_candidates (RFC 8445 §9.1.1.1) but blanks the peer side. - test_e2e_ice_restart_dtls_rehandshake: drives both peers to DTLS_CONNECTED, captures fingerprints, restarts, re-syncs, pumps to reconnect, asserts the new cert hash differs from pre-restart. Catches a regression of 8fea1e8 (without dtls_destroy in nanortc_ice_restart, the second handshake skips dtls_init via the !crypto_ctx guard and reuses the original cert). - test_e2e_ice_restart_sdp_fingerprint_refresh: drives create_offer twice across an ice_restart, asserts sdp.local_fingerprint re-syncs to the freshly generated dtls.local_fingerprint. Catches a regression of 9a75412 (without the fingerprint memset, rtc_cache_fingerprint early-returns on the stale cache and the SDP cache holds the old cert hash while DTLS holds the new one). Validation: each new test was confirmed to fail when its target commit is surgically reverted in the working tree, and to pass on baseline develop. ./scripts/ci-check.sh --fast 15/15 green; full host suite 22/22. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(phase8): record post-PR-5 ice_restart hardening as TD-022 The post-PR-5 follow-up commits surfaced by the Copilot review on PR #59 (d59a528 / 8fea1e8 / 9a75412 / af8f566) were not tracked anywhere — TD-021 closed the original ICE hardening pass and phase8 PR-5 was already marked completed. Add a consolidated TD-022 entry naming each commit, the bug it fixes, and the test that pins it (the unit T20/T21 sites plus the two new e2e tests). Add a "Post-merge hardening" subsection at the tail of phase8 PR-5 with a commit -> failing-test mapping table so future readers can connect the breadcrumb trail without crawling git log. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(h265): wire receive-path demux + first H.265 video e2e PR-2 stage B closes the receive half of the H.265 wiring so two nanortc instances can roundtrip H.265 end-to-end: - Per-track depacketizer becomes a codec-selected union (track.video.depkt.{h264,h265}) in src/nano_media.h; the H.265 arm is conditional on NANORTC_FEATURE_H265 so OFF builds keep the prior sizeof(nanortc_track_t). - track_init() dispatches on m->codec to call h265_depkt_init() for H.265 video tracks, else falls back to h264_depkt_init(). - nano_rtc_media_handle_rtp_or_rtcp() RTP demux dispatches to h265_depkt_push() + h265_is_keyframe() when the track is H.265; the single NANORTC_EV_MEDIA_DATA emit site is shared because the event payload is codec-neutral. Tests: - test_h265_track_init_dispatches_to_h265_depkt drives the same FU three-fragment vector through track_init + the union accessor as the standalone module test, proving dispatch path == direct path. - test_e2e_h265_loopback is the project's first true two-instance video e2e: full SDP/ICE/DTLS, one IDR Annex-B AU sent, answerer surfaces byte-identical payload with is_keyframe asserted. Stage A (SDP plumbing + send path) shipped 2026-04-16 during the browser interop hardening session; phase3-5-h265.md now records the staged-landing narrative and checks all PR-2 acceptance criteria. PR-3 (libdatachannel interop + browser + doc finalization) pending. --------- Co-authored-by: Claude <noreply@anthropic.com>
1 parent 23c9c01 commit e41f5e5

10 files changed

Lines changed: 304 additions & 37 deletions

File tree

docs/PLANS.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ NanoRTC is built by AI coding agents. Estimates use **agent sessions** (one focu
1111
| Plan | Phase | Status | Effort | Target |
1212
|------|-------|--------|--------|--------|
1313
| [Phase 2: Audio](exec-plans/active/phase2-audio.md) | 2 | **Active** — Pending human verification of bidirectional audio + ESP32 intercom | 1 session | Bidirectional audio with browser |
14-
| [Phase 3.5: H.265/HEVC](exec-plans/active/phase3-5-h265.md) | 3.5 | **Active** — PR-1 (module + tests + fuzz) in flight; PR-2 (SDP/RTC wiring) + PR-3 (interop + browser) pending | 2–3 sessions | H.265 video codec via RFC 7798, bidirectional send+receive |
14+
| [Phase 3.5: H.265/HEVC](exec-plans/active/phase3-5-h265.md) | 3.5 | **Active** — PR-1 landed 2026-04-13. PR-2 landed in two stages: bulk wiring 2026-04-16 (browser interop hardening), receive-path demux + first video e2e 2026-05-07. PR-3 (libdatachannel interop + browser + doc finalization) pending. | 2–3 sessions | H.265 video codec via RFC 7798, bidirectional send+receive |
1515
| [Phase 5: Network Traversal](exec-plans/active/phase5-network-traversal.md) | 5 | **Active** — Sessions 1-4 complete. All acceptance criteria met. | 3 sessions | NAT traversal for production |
1616
| [Phase 8: Continued Optimization & Stability](exec-plans/active/phase8-continued-optimization.md) | 8 | **Active** — PR-2 (SCTP failure event propagation) landed 2026-04-25, PR-3 (video pkt_ring decoupling) landed 2026-04-23, PR-4 (H.264 zero-copy iterator) landed 2026-04-25, PR-5 (ICE CONTROLLING fix) landed 2026-04-13. PR-1 pending; P2 series on demand. | 4–6 sessions | IoT memory profile, stability propagation, video pkt_ring decoupling, H.264 zero-copy, ICE CONTROLLING per-pair transaction fix |
1717
| [Phase 9: BWE Perception for IoT Camera](exec-plans/active/phase9-bwe-perception.md) | 9 | **Active** — PR-1…PR-5 landed + post-review hardening + example-layer BWE coordinator extracted. Browser interop snapshot pending. | 1–2 sessions | TWCC parser + SDP/RTP wiring, loss-based BWE controller, runtime bounds/threshold API, send_fps + send_bitrate + fraction_lost stats, shared `examples/common/bwe_coordinator` glue, rk3588 `capture_set_bitrate()` consumer |

docs/exec-plans/active/phase3-5-h265.md

Lines changed: 90 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Phase 3.5: H.265/HEVC Video Support
22

3-
**Status:** Active — PR-1 (module + tests + fuzz) ready for review; PR-2 and PR-3 pending.
3+
**Status:** Active — PR-1 (module + tests + fuzz) landed 2026-04-13; PR-2 (wiring + e2e) landed in two stages — bulk shipped during the 2026-04-16 browser interop hardening session, receive-path demux + first video e2e shipped 2026-05-07. PR-3 (libdatachannel interop + browser + doc finalization) pending.
44
**Estimated effort:** 2–3 agent sessions total (split across 3 independent PRs).
55
**Authoritative spec:** [RFC 7798](https://www.rfc-editor.org/rfc/rfc7798) — "RTP Payload Format for High Efficiency Video Coding (HEVC)" (December 2016).
66

@@ -148,36 +148,102 @@ H.265 defaults to **PT = 98**, disjoint from H.264's PT = 96, so a future same-m
148148

149149
## PR-2 — SDP + `nano_rtc` wiring + parameter-sets API + e2e
150150

151-
**Status:** Pending.
151+
**Status:** Landed in two stages.
152152

153-
### What to land
153+
### Stage A — bulk wiring (2026-04-16, "browser interop hardening")
154154

155-
- **Public API** (`include/nanortc.h`):
156-
- `NANORTC_CODEC_H265` appended to `nanortc_codec_t` enum.
157-
- New `nanortc_video_set_h265_parameter_sets()` declaration, gated on `NANORTC_FEATURE_H265`.
158-
- **SDP m-line state** (`src/nano_sdp.h`): add `video_h265_rtpmap_pt` sibling to `video_h264_rtpmap_pt`, and `h265_sprop_fmtp[]` + length for the pre-formatted sprop-* fragment.
155+
Driven by a downstream rk3588 camera consumer chasing real Chrome / Safari
156+
viewers at 1080p. The four bugs uncovered there (see "Decision log"
157+
2026-04-16 entry) forced the SDP plumbing to ship before the receive-path
158+
demux even existed. The following items all landed in that session:
159+
160+
- **Public API** (`include/nanortc.h`): `NANORTC_CODEC_H265` enum value;
161+
`nanortc_video_set_h265_parameter_sets()` declaration gated on
162+
`NANORTC_FEATURE_H265`.
163+
- **SDP m-line state** (`src/nano_sdp.h`): `video_h265_rtpmap_pt`,
164+
`h265_sprop_fmtp[NANORTC_H265_SPROP_FMTP_SIZE]`, `h265_sprop_fmtp_len`,
165+
plus `h265_profile_id` / `h265_tier_flag` / `h265_level_id` extracted
166+
from the VPS profile_tier_level (so Safari decoders see the actual
167+
level instead of the hardcoded default).
159168
- **SDP parser** (`src/nano_sdp.c`):
160-
- `parse_rtpmap` branch for `H265/90000`.
161-
- fmtp parser branch accepting `profile-id`, `tier-flag`, `level-id`, rejecting `tx-mode ≠ SRST` and `sprop-max-don-diff > 0`.
162-
- **SDP builder** (`src/nano_sdp.c` `sdp_append_video_mline`): dispatch on `ml->codec` to emit either H.264 or H.265 rtpmap/fmtp/rtcp-fb block.
163-
- **PT assignment** (`src/nano_rtc.c` `nanortc_add_track`): H.264 → 96, H.265 → `NANORTC_VIDEO_H265_DEFAULT_PT` (98).
164-
- **Negotiation** (`src/nano_rtc.c` `rtc_apply_negotiated_media`): select H.264 or H.265 rtpmap PT based on local track's codec.
165-
- **Send path** (`src/nano_rtc.c` `rtc_send_video` / `nanortc_send_video`): scan Annex-B access unit to build `h265_nal_ref_t` stack array (≤ `NANORTC_MAX_NALS_PER_AU`), call `h265_packetize_au()` when codec is H.265, else `h264_packetize()` per-NAL.
166-
- **Receive path** (`src/nano_rtc.c` RTP dispatch): call `h265_depkt_push()` or `h264_depkt_push()` based on `m->codec`.
167-
- **`nanortc_video_set_h265_parameter_sets()` implementation**: validate mid + codec, base64-encode VPS/SPS/PPS via `nano_base64_encode`, format `sprop-vps=..;sprop-sps=..;sprop-pps=..` into the mline scratch buffer.
168-
- **Per-track union** (`src/nano_media.h`): convert `track.video.h264_depkt` to an inner union of h264/h265 depacketizers. Every access site in `nano_rtc.c` updated.
169-
- **Tests** (`tests/test_sdp.c`): 4 new SDP tests — basic H.265 offer parse, reject non-SRST tx-mode, reject non-zero sprop-max-don-diff, generate answer with sprop-*.
170-
- **E2E** (`tests/test_e2e.c` or new `tests/test_e2e_h265.c`): loopback between two `nanortc_t` instances negotiating H.265, sending an Annex-B frame, receiving via `NANORTC_EV_MEDIA_DATA`, comparing bytes.
169+
- `parse_rtpmap` H.265 branch — sets `ml->video_h265_rtpmap_pt` and
170+
immediately adopts the remote PT when the local track's codec is
171+
H.265 (handles Safari offers that omit fmtp).
172+
- `parse_fmtp` H.265 branch — accepts `profile-id` / `tier-flag` /
173+
`level-id`, rejects `tx-mode ≠ SRST` and `sprop-max-don-diff > 0`.
174+
- "Local-H.265 must not be hijacked by H.264 fmtp" gate at line 403:
175+
`bool local_is_h265 = (ml->codec == NANORTC_CODEC_H265)` short-circuits
176+
H.264 PT selection when the local track has explicitly chosen H.265.
177+
- **SDP builder** (`src/nano_sdp.c` `sdp_append_video_mline`): codec-branched
178+
emission for H.265 rtpmap / fmtp / rtcp-fb; sprop-* fragment is included
179+
only when `nanortc_video_set_h265_parameter_sets()` populated it.
180+
- **PT assignment** (`src/nano_rtc_media.c` `nanortc_add_track`): H.264 → 96
181+
(`NANORTC_VIDEO_DEFAULT_PT`), H.265 → 98 (`NANORTC_VIDEO_H265_DEFAULT_PT`).
182+
- **Send path** (`src/nano_rtc_media.c`): `rtc_send_video_h265()` plus the
183+
codec dispatch at `nanortc_send_video()` line 557 (`m->codec ==
184+
NANORTC_CODEC_H265` → H.265 path; else falls into the existing H.264
185+
zero-copy fragment iterator). Greedy AU packer + callback emits one
186+
Single / AP / FU per RTP packet via `pkt_ring_alloc_slot` /
187+
`pkt_ring_commit_slot`.
188+
- **`nanortc_video_set_h265_parameter_sets()` implementation**: base64-encodes
189+
VPS/SPS/PPS via `nano_base64_encode`, writes
190+
`sprop-vps=…;sprop-sps=…;sprop-pps=…` into the m-line scratch buffer,
191+
parses the VPS profile_tier_level to populate `ml->h265_{profile_id,
192+
tier_flag, level_id}` (skipping H.265 §7.4.1.1 emulation-prevention
193+
bytes).
194+
- **SDP tests** (`tests/test_sdp.c`): four hardening regressions —
195+
`test_sdp_h265_rtpmap_without_fmtp_picks_remote_pt`,
196+
`test_sdp_h265_local_track_not_hijacked_by_h264_fmtp`,
197+
`test_sdp_parse_preserves_local_h265_state`,
198+
`test_sdp_parse_reject_h265_msmt`.
199+
200+
What stage A *did not* ship: the receive-path depacketizer dispatch and
201+
the per-track depkt union — the camera consumer was send-only, so the
202+
unwired receive path went unnoticed for three weeks until this PR.
203+
204+
### Stage B — receive-path demux + first video e2e (2026-05-07)
205+
206+
The remaining wiring needed for true bidirectional H.265:
207+
208+
- **Per-track depkt union** (`src/nano_media.h`): the named field
209+
`track.video.h264_depkt` becomes a named inner union
210+
`track.video.depkt.h264` / `track.video.depkt.h265`. The H.265 arm is
211+
conditionally compiled under `#if NANORTC_FEATURE_H265`, so
212+
`H265=OFF` builds keep the original `sizeof(nanortc_track_t)`
213+
byte-for-byte.
214+
- **track_init dispatch** (`src/nano_media.c`): `kind == VIDEO` block
215+
branches on `m->codec` to call `h265_depkt_init()` for H.265 tracks,
216+
else falls back to `h264_depkt_init()`. Unknown video codecs land on
217+
the H.264 path (current behavior; the receive demux drops anything
218+
whose PT doesn't match a registered track anyway).
219+
- **Receive demux dispatch** (`src/nano_rtc_media.c:998-1037`): the
220+
hardcoded H.264 block becomes codec-dispatched. H.265 calls
221+
`h265_depkt_push()` + `h265_is_keyframe()`, H.264 keeps its existing
222+
path. The single `NANORTC_EV_MEDIA_DATA` emission site is shared —
223+
the event payload is codec-neutral.
224+
- **Unit test** (`tests/test_h265.c`):
225+
`test_h265_track_init_dispatches_to_h265_depkt` uses the public
226+
`track_init` + the inner union accessor, then drives the same
227+
hand-crafted FU 3-fragment vector as `test_h265_depkt_fu_hand_crafted`
228+
to prove the dispatch path matches the stand-alone module path.
229+
- **First video e2e** (`tests/test_e2e.c`): `test_e2e_h265_loopback` is
230+
the project's first true two-instance video roundtrip. Negotiates an
231+
H.265 SENDONLY/RECVONLY pair through full SDP/ICE/DTLS, sends one
232+
IDR Annex-B access unit via `nanortc_send_video()`, and asserts the
233+
answerer surfaces it through `NANORTC_EV_MEDIA_DATA` with
234+
byte-identical payload + `is_keyframe == true`. One-way relay (not
235+
`e2e_pump`) — `e2e_pump` would drain the answerer's event queue
236+
before the test could see it.
171237

172238
### Acceptance criteria
173239

174-
- [ ] `nanortc_add_video_track(rtc, dir, NANORTC_CODEC_H265)` returns a valid MID
175-
- [ ] `nanortc_video_set_h265_parameter_sets()` emits correctly base64-encoded sprop-* in the generated SDP
176-
- [ ] Two-nanortc loopback: `set_parameter_sets``create_offer``accept_answer``send_video(Annex-B AU)``EV_MEDIA_DATA` with byte-identical payload
177-
- [ ] H.264 existing test suite still passes unchanged
178-
- [ ] 6-profile × 2-crypto × `NANORTC_FEATURE_H265={ON,OFF}` build matrix passes
240+
- [x] `nanortc_add_video_track(rtc, dir, NANORTC_CODEC_H265)` returns a valid MID — covered by stage A SDP tests.
241+
- [x] `nanortc_video_set_h265_parameter_sets()` emits correctly base64-encoded sprop-* in the generated SDP — stage A tests + the e2e loopback.
242+
- [x] Two-nanortc loopback: `set_parameter_sets``create_offer``accept_answer``send_video(Annex-B AU)``EV_MEDIA_DATA` with byte-identical payload`test_e2e_h265_loopback` (stage B).
243+
- [x] H.264 existing test suite still passes unchanged`test_h264` 32/32 green after the union rename; verified on host (`MEDIA` profile).
244+
- [x] 7-profile × 2-crypto × `NANORTC_FEATURE_H265={ON,OFF}` build matrix passes — see `./scripts/ci-check.sh` for the canonical combinations (DATA, AUDIO, MEDIA, MEDIA_H265, AUDIO_ONLY, MEDIA_ONLY, CORE_ONLY).
179245

180-
**Quality grade at end of PR-2:** `nano_h265.c` = **B** (all wiring in place, end-to-end loopback verified, fuzz execs ramped up).
246+
**Quality grade at end of PR-2:** `nano_h265.c` = **B** (all wiring in place, end-to-end loopback verified, fuzz execs ramped up). Promotes to **A** after PR-3 ships libdatachannel H.265 interop + browser verification + ≥ 50M `fuzz_h265` execs.
181247

182248
---
183249

docs/exec-plans/active/phase8-continued-optimization.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ Landed 2026-04-26 → 2026-04-27. A Copilot review on PR #59 surfaced three late
220220
|---|---|---|
221221
| `d59a528` | Defer `state = NANORTC_STATE_DTLS_HANDSHAKING` until `dtls_start()` returns OK so a failure leaves the FSM at `ICE_CONNECTED` and is retryable. | `tests/test_trickle_ice.c::test_api_ice_restart_clears_dtls` (FSM-level). |
222222
| `8fea1e8` | Call `dtls_destroy(&rtc->dtls)` first thing in `nanortc_ice_restart()` so the next handshake re-runs `dtls_init()` instead of skipping it via the `if (!crypto_ctx)` guard. | `tests/test_e2e.c::test_e2e_ice_restart_dtls_rehandshake` — without the destroy, the post-restart fingerprint matches the pre-restart fingerprint (cert reused). |
223-
| `9a75412` | `memset` both `sdp.local_fingerprint` and `dtls.local_fingerprint` after the destroy so `rtc_cache_fingerprint()` (a write-once cache) re-populates from the freshly generated cert. | `tests/test_e2e.c::test_e2e_ice_restart_sdp_fingerprint_refresh` — without the clear, the SDP cache holds the stale hash while DTLS holds the new one, so any peer that enforces `a=fingerprint` rejects the new handshake. |
223+
| `9a75412` | `memset` both `sdp.local_fingerprint` and `dtls.local_fingerprint` after the destroy so `nano_rtc_cache_fingerprint()` (a write-once cache) re-populates from the freshly generated cert. | `tests/test_e2e.c::test_e2e_ice_restart_sdp_fingerprint_refresh` — without the clear, the SDP cache holds the stale hash while DTLS holds the new one, so any peer that enforces `a=fingerprint` rejects the new handshake. |
224224
| `af8f566` | Doc-only: corrected the rationale comment so the FSM-vs-receive-path distinction is explicit. | n/a (doc). |
225225

226226
The two new E2E tests bracket the unit-level T20/T21 coverage with a full ICE → DTLS round trip on both sides, restart, re-sync, reconnect, and assert new keying material — closing the gap that the unit tests probe state in isolation but cannot show that two peers actually re-handshake. Validation: each new e2e test was confirmed to fail when its target commit is surgically reverted in the working tree (and to pass on baseline `develop`).

0 commit comments

Comments
 (0)