Skip to content

release: review-driven DTLS/ICE-restart fixes + CRC vector tests#59

Merged
0x1abin merged 30 commits intomainfrom
develop
Apr 28, 2026
Merged

release: review-driven DTLS/ICE-restart fixes + CRC vector tests#59
0x1abin merged 30 commits intomainfrom
develop

Conversation

@0x1abin
Copy link
Copy Markdown
Owner

@0x1abin 0x1abin commented Apr 26, 2026

Summary

  • DTLS lifecycle hardening: nanortc_ice_restart now tears down the DTLS context, and rtc_begin_dtls_handshake only advances state after dtls_start succeeds (review §4 P1-4 / P2-3).
  • New direct CRC algorithm-vector tests (RFC 1952 / 3309 / 3720), gated on DATACHANNEL to keep CORE_ONLY/AUDIO_ONLY/MEDIA_ONLY linking clean (review §4 P3-1).
  • CI expansion: fuzz job now includes fuzz_h265 + fuzz_turn; new interop job runs Phase 5.2 TURN-relay regression suite against coturn.

Test plan

  • CI: full matrix (DATA / AUDIO / MEDIA / MEDIA_H265 / AUDIO_ONLY / MEDIA_ONLY / CORE_ONLY) green
  • CI: fuzz job (stun / sctp / dtls / h264 / h265 / turn) green
  • CI: new interop-turn-relay job green (5 ctest cases)
  • CI: ASan + coverage jobs green
  • Local: ./scripts/ci-check.sh --fast clean before merge

🤖 Generated with Claude Code

claude and others added 22 commits April 22, 2026 18:24
…ail cursor

Decouple the video NACK retransmit ring from NANORTC_OUT_QUEUE_SIZE so IoT
targets can shrink the NACK window without starving the output dispatch
queue. Default equals NANORTC_OUT_QUEUE_SIZE — fully behavior-preserving.

- nanortc_config.h: new macro with ifndef guard, ESP-IDF Kconfig hook,
  power-of-two + min-4 validation. Caller-drain invariant documented.
- nanortc.h: resize pkt_ring[] and pkt_ring_meta[] to the new macro, add
  pkt_ring_tail field (memset-zeroed by nanortc_init).

Write-path and NACK-scan updates land in the follow-up commit.

Part of Phase 8 PR-3 (docs/exec-plans/active/phase8-continued-optimization.md).
… + NACK paths

- nano_rtc.c video_send_fragment_cb: derive slot from pkt_ring_tail (mask
  NANORTC_VIDEO_PKT_RING_SIZE - 1) instead of out_tail, advance the cursor
  after stamping meta. out_queue slot selection inside rtc_enqueue_transmit
  is unchanged.
- nano_rtc.c NACK retransmit scan: iterate over NANORTC_VIDEO_PKT_RING_SIZE
  so scan bounds track pkt_ring actual size (not the unrelated output queue).
- Kconfig: expose CONFIG_NANORTC_VIDEO_PKT_RING_SIZE, defaulting to the
  existing OUT_QUEUE_SIZE value for backward-compat.

Verified: default build (PKT_RING=OUT_QUEUE=32) and override build
(-DNANORTC_VIDEO_PKT_RING_SIZE=16 with OUT_QUEUE=32) both build clean and
pass 21/21 host tests.

Part of Phase 8 PR-3 (docs/exec-plans/active/phase8-continued-optimization.md).
…IZE knob

Adds three focused unit tests in test_media.c (under #if NANORTC_FEATURE_VIDEO)
that mirror the nano_rtc.c write-path / NACK-scan idioms against a zeroed
nanortc_t:
  - in-window lookup: recently-stamped seqs are findable.
  - out-of-window miss: after 2× ring wrap, old seqs are gone without crash.
  - independence from out_tail: pkt_ring_tail advances alone; out_tail stays
    at whatever the caller set; slot 0 holds the newest wrap's seq.

Tests pass on both default build (PKT_RING=OUT_QUEUE=32) and IoT override
(-DNANORTC_VIDEO_PKT_RING_SIZE=16 with OUT_QUEUE=32), which exercises the
wraparound path impossible to hit at the default.

Docs: add a "Video NACK ring — tunable independently of the output queue"
section to memory-profiles.md covering the macro, example override,
safety invariant (Sans-I/O caller-drain contract), and expected ~19 KB
saving at PKT_RING_SIZE=16. Tweak the Biggest Contributors and Kconfig
tables to point at the new macro.

Part of Phase 8 PR-3 (docs/exec-plans/active/phase8-continued-optimization.md).
Copilot review on PR #54 flagged that the "override to 8 saves ~19 KB"
claim in the Kconfig help text is wrong for ESP-IDF defaults — with
OUT_QUEUE_SIZE=16 (video) the 16→8 drop at MEDIA_BUF_SIZE=1232 only
saves ~9.6 KB. The ~19 KB figure matches the host-default 32→16 case.

Rewrite the Kconfig help text around the formula
  (OUT_QUEUE_SIZE - PKT_RING_SIZE) × MEDIA_BUF_SIZE
so it stays accurate across host and ESP-IDF configurations, with a
Kconfig-correct 9.6 KB worked example.

memory-profiles.md already targets host overrides, but extend its
inline comment to spell out both host (32→16 ≈19 KB) and Kconfig
(16→8 ≈9.6 KB) numbers so readers in either context pick the right one.

Addresses #54 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address PR #54 review concern: when NANORTC_VIDEO_PKT_RING_SIZE is set
below the per-frame fragment count, video_send_fragment_cb silently
wraps pkt_ring while out_queue[].transmit.data still references the
stale slots — the receiver gets newer fragment bytes carrying the older
fragment's RTP seq, manifesting as glitched IDRs. Sans-I/O drain doesn't
save you because nanortc_send_video() emits every FU-A fragment of one
access unit before returning to the caller.

Add a runtime guard: when (out_tail - out_head) >= PKT_RING_SIZE, the
ring would wrap into a slot whose pointer is still pending. Bump
nanortc_t.stats_pkt_ring_overrun (atomic, matches existing stats_*
fields) and emit a single static-string NANORTC_LOGW so under-sizing
shows up in integration smoke tests rather than as garbled keyframes
on the wire. Counter is exported on the struct for app glue to inspect.

Cannot prevent the corruption from this layer (the slot is already
about to be written), but turning silent-bad-video into a debuggable
signal matches the LOG-injection-for-AI-debug use case the rest of
the library follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror video_send_fragment_cb's write path (pkt_ring slot + meta stamp +
out_queue pointer enqueue) in test_media.c so the test exercises the
actual pointer-aliasing surface, not just the NACK-scan ring semantics.

Two new tests:
  - test_pkt_ring_overrun_counter_fires_when_undersized: pushes
    PKT_RING_SIZE+3 fragments without draining out_queue, asserts the
    counter trips exactly 3 times — guards the contract enforced in
    nano_rtc.c.
  - test_pkt_ring_aliasing_corruption_demonstrated_without_guard:
    documents *why* the guard exists. Without draining, out_queue[0]'s
    .data pointer still addresses pkt_ring[0], which has since been
    overwritten by the wrapped fragment. Asserts the byte at .data[0]
    is the latest writer's marker, not the original. This is the
    silent-corruption pattern the runtime guard surfaces.

Both pass under default config (PKT_RING=OUT_QUEUE=32, no overrun
because out_queue fills first) and under the IoT override
(-DNANORTC_VIDEO_PKT_RING_SIZE=8 -DNANORTC_OUT_QUEUE_SIZE=32, where the
overrun path actually fires).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ent bound

Replace the "drain each tick" framing in Kconfig + memory-profiles.md
with the precise hard constraint:

  PKT_RING_SIZE >= ceil(max_frame_bytes / NANORTC_VIDEO_MTU) + 1

The previous wording implied the Sans-I/O drain contract was sufficient,
but nanortc_send_video() emits every FU-A fragment of one access unit
before returning to the caller — there is no opportunity to drain
mid-frame. Sizing PKT_RING_SIZE below the per-frame fragment count
silently corrupts every IDR (older fragment buffers are overwritten
while their pointers are still queued in out_queue).

Add a worked-numbers table for 480p/720p/1080p H.264 at MTU 1200, and
soften the "500 ms NACK history at 30 fps" claim — that figure only
held for single-NAL frames; multi-NAL streams cover much less wall-
clock time per slot. Cross-reference the new stats_pkt_ring_overrun
counter so integrators can wire it into smoke tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Close out PR-3 in the Phase 8 exec plan. Key corrections vs the original
draft:

- The draft step 3 recommended "PKT_RING=16 at 30 fps = ~500 ms NACK
  window" as the IoT sizing rule. That claim only held for single-NAL
  frames — on 720p+ streams with multi-NAL IDRs it silently corrupted
  every keyframe because the ring would wrap while out_queue[].data
  pointers still referenced the overwritten slots. Replaced with the
  real hard constraint:
    PKT_RING_SIZE >= ceil(max_frame_bytes / NANORTC_VIDEO_MTU) + 1
  plus a worked-numbers table (480p ≈ 16, 720p ≈ 32, 1080p ≈ 64),
  landing in memory-profiles.md.
- Added a runtime overrun guard + stats_pkt_ring_overrun counter so
  under-sizing surfaces in integration smoke tests rather than as
  glitched IDRs on the wire.
- Grew the test count from the planned single test_nack_ring to five
  tests in test_media.c, including an aliasing-corruption witness
  that fails without the guard.

Also update docs/PLANS.md line 16 from "Planning" to track PR-3 and
PR-5 as landed so the phase's next-action surface is accurate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decouple video NACK ring from output queue size
Resolve nano_rtc.c conflict by keeping the explanatory comments around
the pkt_ring overrun guard. The code logic is identical to origin/main
(both branches independently landed the same overrun-detection logic);
the comments document the aliasing-corruption rationale and NACK seq
invariant that motivate the guard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address Copilot review on PR #57:

- nanortc_config.h: replace the misleading "drain each tick" framing on
  NANORTC_VIDEO_PKT_RING_SIZE with the actual hard sizing rule
  (ceil(max_frame_bytes / VIDEO_MTU) + 1). nanortc_send_video() emits
  every FU-A fragment of an access unit before returning, so callers
  cannot drain mid-frame; sizing must be based on worst-case
  fragments-per-frame, and savings depend on OUT_QUEUE_SIZE rather than
  a fixed ~19 KB number.

- Kconfig: rephrase "Set the runtime overrun counter" to "Monitor
  stats_pkt_ring_overrun" — the counter is incremented internally and
  is only readable by integrators, not user-settable.

- tests/test_media.c: rename
  test_pkt_ring_aliasing_corruption_demonstrated_without_guard to
  test_pkt_ring_aliasing_corrupts_pending_pointers_when_undersized.
  The previous name implied the helper bypassed the guard, but the
  test uses pkt_ring_simulate_send_with_guard() and the guard does
  fire — what the test actually demonstrates is that the runtime
  guard *cannot prevent* corruption once PKT_RING_SIZE is undersized;
  it only surfaces the problem via stats_pkt_ring_overrun.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pkt_ring overrun-guard + slot-allocation pattern, plus the matching
NACK metadata stamp and pkt_ring_tail advance, were duplicated between
the H.264 zero-copy loop in rtc_send_video (added by PR #56 / fbb5b4f)
and the H.265 callback video_send_fragment_cb (added by PR #54 /
c1cb699). The verbose comment block explaining the aliasing-corruption
invariant lived only at the H.265 site, leaving the H.264 site
under-documented.

Extract:

  static uint8_t *pkt_ring_alloc_slot(nanortc_t *, uint16_t *out_pslot)
      — runs the (out_tail - out_head) >= PKT_RING_SIZE check, bumps
        stats_pkt_ring_overrun + emits NANORTC_LOGW on overrun, returns
        rtc->pkt_ring[pslot].
  static void pkt_ring_commit_slot(nanortc_t *, uint16_t pslot,
                                   uint16_t wire_seq, uint16_t srtp_len)
      — stamps pkt_ring_meta and increments pkt_ring_tail.

Both helpers live at the top of the #if NANORTC_FEATURE_VIDEO block,
ahead of the H.265 nesting, so both writers see the same definition.
The aliasing-corruption rationale now lives once at pkt_ring_alloc_slot
where both callers can find it.

Behavior identical: same guard logic, same overrun counter increment,
same slot index derivation, same meta layout. Verified across the
default + ASan + ICE_SRFLX-off + H.265 feature combos (21 / 21 host
tests, 23 / 23 with H.265 enabled, 15 / 15 ci-check.sh --fast).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docs/PLANS.md line 16: Phase 8 status now lists PR-4 as landed
  2026-04-25 (PR #56 / fbb5b4f), matches PR-3 / PR-5 entries.
- docs/exec-plans/active/phase8-continued-optimization.md:
  - Tag PR-4 as [COMPLETED 2026-04-25] in both the summary table and
    the section header.
  - Replace the speculative "Approach" bullets with the landed approach
    (Option C — fragment iterator), measured savings, test additions,
    and verification record.
  - PR-3 step 5: note the runtime overrun guard was helper-extracted
    into pkt_ring_alloc_slot in PR #57 and is now shared between the
    H.264 and H.265 writers.
  - PR-3 step 6: rename
    aliasing_corruption_demonstrated_without_guard to
    aliasing_corrupts_pending_pointers_when_undersized to match the
    test rename in ba32abf.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nsctp_handle_timeout() transitions SCTP to CLOSED after exceeding
NANORTC_SCTP_MAX_RETRANSMITS but never notified the upper layer —
applications only learned about the loss on the next
nanortc_datachannel_send() failure, a silent stall from the user's
point of view. Phase 8 PR-2.

Wire the signal through:

- nano_sctp.h: new bool closed_due_to_failure on nano_sctp_t, set
  exactly once when retransmit exhaustion trips the CLOSED transition.
  Init-time CLOSED (line 480) and peer-ABORT-induced CLOSED (line 982)
  deliberately do NOT set this — they have separate signaling channels;
  rationale recorded inline.

- nano_sctp.c: set the flag alongside the existing state=CLOSED in the
  retransmit-exhaustion branch.

- nano_rtc.c: in rtc_process_timers, after the SCTP timer block, read
  and clear the flag. If the RTC was previously NANORTC_STATE_CONNECTED,
  emit NANORTC_EV_DISCONNECTED and transition to NANORTC_STATE_CLOSED.
  Mirrors the existing ICE-failure / consent-expiry paths. Critical
  detail: emit block lives outside the if (sctp.state == ESTABLISHED)
  guard — once the timeout flips state to CLOSED, that guard goes false
  on the next tick and the signal would otherwise be lost.

- tests/test_sctp.c:
  * test_sctp_timeout_sets_closed_flag (under #if NANORTC_FEATURE_DC_RELIABLE)
    walks the full RTO ramp (1000 → 2000 → 4000 → 8000 → 10000 capped)
    through MAX_RETRANSMITS cycles of nsctp_handle_timeout + nsctp_poll_output,
    asserts CLOSED + flag on the threshold-trip iteration.
  * test_sctp_abort_does_not_set_failure_flag — scope guard ensuring
    peer ABORT does not regress into the failure flag.

Verified across DC_RELIABLE=ON (test_sctp 56/56), DC_RELIABLE=OFF
(test_sctp 55/55, Test 1 correctly skipped), and scripts/ci-check.sh
--fast (15/15 incl. ASan + ICE_SRFLX=OFF + format).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docs/PLANS.md line 16: Phase 8 status now lists PR-2 (SCTP failure
  event propagation) as landed 2026-04-25, alongside the existing
  PR-3 / PR-4 / PR-5 entries; only PR-1 (IoT memory profile) remains
  pending in the recommended-order set.
- docs/exec-plans/active/phase8-continued-optimization.md:
  - Tag PR-2 as [COMPLETED 2026-04-25] in the section header.
  - Replace the speculative "Approach" bullets with the landed
    implementation: closed_due_to_failure flag, RTC emit block placed
    outside the ESTABLISHED guard (the non-obvious detail), the two
    new regression tests, and verification record (test_sctp 56/56
    with DC_RELIABLE=ON, 55/55 with DC_RELIABLE=OFF, ci-check.sh
    --fast 15/15).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both harnesses are already registered in tests/fuzz/CMakeLists.txt
(fuzz_turn in FUZZ_CORE_TARGETS, fuzz_h265 under FUZZ_H265_TARGETS,
gated by NANORTC_FEATURE_H265). The CI fuzz job already configures
H265=ON, so adding the two harnesses to the run loop is the only
remaining step.

Closes review §4 P1-1 (docs/reviews/2026-04-26-overall-review.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spins up the existing tests/interop/turn-server/docker-compose.yml
coturn (via scripts/start-test-turn.sh) and runs ctest -L turn-relay
against test_interop_turn_relay_nanortc.c (5 cases). Phase 5.2 F6-F10
regressions are now caught in CI; previously they only had hand
verification on a downstream camera SDK.

Updates docs/engineering/turn-rfc-compliance.md to reflect the new
coverage status (the 'no automated test' note was stale once the
744-LOC harness landed).

Closes review §4 P1-2 + P1-5 (docs/reviews/2026-04-26-overall-review.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rtc_begin_dtls_handshake() previously set rtc->state to
DTLS_HANDSHAKING before calling dtls_start(). On dtls_start failure
the function returned early but state was already advanced —
leaving a half-initialised DTLS context that subsequent
handle_input() bytes would drive down the DTLS path. Reorder so
state is only advanced after dtls_start() succeeds; failure leaves
state at ICE_CONNECTED so the caller can retry cleanly.

Failure-path mock test deferred until a mock crypto provider lands
(no current way to make dtls_init / dtls_start fail in tests).

Closes review §4 P1-4 (docs/reviews/2026-04-26-overall-review.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without explicit dtls_destroy() in nanortc_ice_restart(), the
"if (!rtc->dtls.crypto_ctx)" guard in rtc_begin_dtls_handshake
would skip dtls_init on the next handshake and reuse the previous
DTLS context — possible key-material reuse / orphan handshake
timers. dtls_destroy is a no-op when crypto_ctx is NULL, so calling
it unconditionally is safe even before any DTLS init.

Adds two regression tests:
- test_api_ice_restart_clears_dtls: dtls_init -> ice_restart asserts
  crypto_ctx is NULL and dtls.state is CLOSED post-restart.
- test_api_ice_restart_no_dtls_safe: ice_restart without prior
  dtls_init does not crash.

Closes review §4 P2-3 (docs/reviews/2026-04-26-overall-review.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CRC modules (nano_crc32.c / nano_crc32c.c) carried 100% line
coverage indirectly via fuzz_stun (FINGERPRINT verify) and
fuzz_sctp (chunk checksum), but had no direct algorithm-vector
test. An accidental polynomial / initial-value / final-XOR change
could pass indirect fuzz round-trip while still being
algorithmically wrong.

Adds 13 cases pinning canonical RFC vectors:
- CRC-32 ISO HDLC (RFC 1952): empty, "123456789" (0xCBF43926),
  "a" (0xE8B7BE43), "abc" (0x352441C2)
- CRC-32c Castagnoli (RFC 3309 / RFC 3720 §B.4): empty,
  "123456789" (0xE3069283), 32x0x00 (0x8A9136AA), 32x0xFF
  (0x62A8AB43), incrementing/decrementing 32-byte buffers
- Incremental API: segmented == one-shot (the 3-segment pattern
  used by nsctp_verify_checksum), zero-length update tolerance,
  byte-at-a-time fragmentation

Closes review §4 P3-1 (docs/reviews/2026-04-26-overall-review.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
test_crc32.c calls nano_crc32c{,_init,_update,_final}, which live in
src/nano_crc32c.c and are only compiled when NANORTC_FEATURE_DATACHANNEL
is ON. Without gating, AUDIO_ONLY / MEDIA_ONLY / CORE_ONLY builds and
the architecture-constraints DC-OFF check fail to link.

Add test_crc32 to the DC_TESTS list so the existing continue() guard
skips it in DC-OFF configs, mirroring test_datachannel. The 13 CRC
assertions still run in DATA / AUDIO / MEDIA / MEDIA_H265 — RFC vectors
are configuration-invariant, so no coverage is lost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 26, 2026 13:35
Builds esp32_datachannel across esp32 / esp32s3 / esp32p4 and
esp32_audio + esp32_video on esp32s3 via espressif/esp-idf-ci-action
on push/PR to main. Catches component / Kconfig / lwIP regressions
without requiring hardware; on-device Python E2E remains manual.

esp32_camera is intentionally excluded — its board_manager autogen +
camera-sensor selection is board-specific and will land separately.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the DTLS/ICE-restart lifecycle in nanortc and expands validation/CI coverage (CRC vectors, fuzz harnesses, and TURN-relay interop) to better catch regressions in core networking and crypto-adjacent paths.

Changes:

  • DTLS lifecycle: destroy DTLS context on nanortc_ice_restart(); only advance RTC state to DTLS_HANDSHAKING after dtls_start() succeeds.
  • Tests: add direct CRC-32/CRC-32c RFC-vector unit tests; add ICE-restart tests asserting DTLS teardown behavior.
  • CI/docs: extend fuzz job to include fuzz_h265 and fuzz_turn; add a dedicated coturn-backed interop job and update TURN compliance documentation accordingly.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/nano_rtc.c Adjusts DTLS handshake state progression and destroys DTLS state during ICE restart.
tests/test_trickle_ice.c Adds regression tests around ICE restart clearing DTLS state and safety when DTLS is uninitialized.
tests/test_crc32.c Introduces standalone CRC-32/CRC-32c RFC-vector tests plus incremental API checks.
tests/CMakeLists.txt Adds test_crc32 under the DataChannel-gated test set.
docs/engineering/turn-rfc-compliance.md Updates TURN relay coverage narrative to reflect new nanortc-as-client relay tests and CI job.
.github/workflows/ci.yml Expands fuzz harness set and adds a new interop-turn-relay job running coturn-backed labeled tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/nano_rtc.c
Comment thread src/nano_rtc.c Outdated
0x1abin added 4 commits April 26, 2026 21:45
The custom partition table reserves a 0x3a0000 (~3.6 MB) factory app,
ending at 0x3b0000. ESP-IDF's default flash size is 2 MB, so a clean
build from sdkconfig.defaults fails partition-table generation with:

  Partitions tables occupies 3.7MB of flash (3866624 bytes) which
  does not fit in configured flash size 2MB.

The locally-tracked sdkconfig was set to 16 MB, which masked this in
manual builds; the new ESP-IDF CI workflow starts from defaults and
caught the regression.

Set CONFIG_ESPTOOLPY_FLASHSIZE_4MB=y as the minimum size that fits
the partitions; boards with larger flash override via menuconfig.

Verified: gen_esp32part.py with --flash-size 4MB succeeds; with 2MB
reproduces the CI error exactly.
esp32_video's main/CMakeLists.txt packs frames from
examples/sample_data/h264SampleFrames and opusSampleFrames into
video.blob / audio.blob at build time. examples/sample_data is a git
submodule (0x1abin/media_sample_data); without `submodules: recursive`,
actions/checkout@v4 leaves the path empty and the build fails:

  Error: examples/sample_data/h264SampleFrames is not a directory
  Error: examples/sample_data/opusSampleFrames is not a directory

The other matrix targets don't reference sample_data, but enabling
submodules at the workflow level keeps the step uniform; the submodule
is ~15 MB so the cost is negligible.
dtls_destroy() drops the DTLS context but leaves rtc->sdp.local_fingerprint
and rtc->dtls.local_fingerprint populated. rtc_cache_fingerprint() is a
write-once cache (early-returns when sdp.local_fingerprint is non-empty),
so the next create_offer / accept_offer after an ICE restart would
re-advertise the *previous* cert's SDP fingerprint while a fresh
dtls_init() generates a new cert and fingerprint. The peer then rejects
DTLS handshake with a fingerprint mismatch.

Memset both fingerprints to '\0' immediately after dtls_destroy() so
rtc_cache_fingerprint() repopulates them from the next dtls_init's cert.

Test: extend test_api_ice_restart_clears_dtls (T20) — pre-populate
both fingerprint buffers before calling nanortc_ice_restart and assert
they are cleared on return.

Reported by Copilot review on PR #59.
The previous comment claimed that keeping state at ICE_CONNECTED after
a dtls_start() failure prevents "a half-initialised DTLS context being
driven by inbound bytes". That is misleading: the inbound DTLS gate in
handle_input is `state >= NANORTC_STATE_ICE_CONNECTED`, so DTLS records
are still accepted on the retry path. The state ordering only governs
the FSM; the actual receive-path policy is unchanged.

Reword to describe what the reorder actually achieves (FSM stays at
ICE_CONNECTED so callers can retry DTLS startup) and explicitly call
out that the inbound gate is unchanged, so future readers don't infer
a stronger guarantee than exists.

Reported by Copilot review on PR #59.
0x1abin and others added 3 commits April 26, 2026 22:15
# Conflicts:
#	src/nano_rtc.c
#	tests/test_trickle_ice.c
Brackets the unit-level T20/T21 coverage in tests/test_trickle_ice.c
with a full ICE -> DTLS round trip on both sides. Existing unit tests
probe nanortc_ice_restart() against a hand-built nanortc_t and assert
which fields got reset; nothing today drives the end-to-end path of
two peers reaching DTLS_CONNECTED, restarting, and re-handshaking
with fresh keying material.

- e2e_resync_ice_creds() helper: counterpart to e2e_setup_ice_creds()
  that re-syncs ufrag/pwd and re-arms remote_candidates after both
  peers have called nanortc_ice_restart(). ice_restart() preserves
  local_candidates (RFC 8445 §9.1.1.1) but blanks the peer side.

- test_e2e_ice_restart_dtls_rehandshake: drives both peers to
  DTLS_CONNECTED, captures fingerprints, restarts, re-syncs, pumps
  to reconnect, asserts the new cert hash differs from pre-restart.
  Catches a regression of 8fea1e8 (without dtls_destroy in
  nanortc_ice_restart, the second handshake skips dtls_init via
  the !crypto_ctx guard and reuses the original cert).

- test_e2e_ice_restart_sdp_fingerprint_refresh: drives create_offer
  twice across an ice_restart, asserts sdp.local_fingerprint re-syncs
  to the freshly generated dtls.local_fingerprint. Catches a regression
  of 9a75412 (without the fingerprint memset, rtc_cache_fingerprint
  early-returns on the stale cache and the SDP cache holds the old
  cert hash while DTLS holds the new one).

Validation: each new test was confirmed to fail when its target
commit is surgically reverted in the working tree, and to pass on
baseline develop. ./scripts/ci-check.sh --fast 15/15 green; full
host suite 22/22.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The post-PR-5 follow-up commits surfaced by the Copilot review on
PR #59 (d59a528 / 8fea1e8 / 9a75412 / af8f566) were not tracked
anywhere — TD-021 closed the original ICE hardening pass and
phase8 PR-5 was already marked completed.

Add a consolidated TD-022 entry naming each commit, the bug it
fixes, and the test that pins it (the unit T20/T21 sites plus the
two new e2e tests). Add a "Post-merge hardening" subsection at the
tail of phase8 PR-5 with a commit -> failing-test mapping table so
future readers can connect the breadcrumb trail without crawling
git log.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@0x1abin 0x1abin merged commit 84c590d into main Apr 28, 2026
25 checks passed
0x1abin added a commit that referenced this pull request May 7, 2026
…fs (#62)

* feat(video): introduce NANORTC_VIDEO_PKT_RING_SIZE macro + pkt_ring_tail cursor

Decouple the video NACK retransmit ring from NANORTC_OUT_QUEUE_SIZE so IoT
targets can shrink the NACK window without starving the output dispatch
queue. Default equals NANORTC_OUT_QUEUE_SIZE — fully behavior-preserving.

- nanortc_config.h: new macro with ifndef guard, ESP-IDF Kconfig hook,
  power-of-two + min-4 validation. Caller-drain invariant documented.
- nanortc.h: resize pkt_ring[] and pkt_ring_meta[] to the new macro, add
  pkt_ring_tail field (memset-zeroed by nanortc_init).

Write-path and NACK-scan updates land in the follow-up commit.

Part of Phase 8 PR-3 (docs/exec-plans/active/phase8-continued-optimization.md).

* feat(video): wire pkt_ring_tail cursor + honor PKT_RING_SIZE in write + NACK paths

- nano_rtc.c video_send_fragment_cb: derive slot from pkt_ring_tail (mask
  NANORTC_VIDEO_PKT_RING_SIZE - 1) instead of out_tail, advance the cursor
  after stamping meta. out_queue slot selection inside rtc_enqueue_transmit
  is unchanged.
- nano_rtc.c NACK retransmit scan: iterate over NANORTC_VIDEO_PKT_RING_SIZE
  so scan bounds track pkt_ring actual size (not the unrelated output queue).
- Kconfig: expose CONFIG_NANORTC_VIDEO_PKT_RING_SIZE, defaulting to the
  existing OUT_QUEUE_SIZE value for backward-compat.

Verified: default build (PKT_RING=OUT_QUEUE=32) and override build
(-DNANORTC_VIDEO_PKT_RING_SIZE=16 with OUT_QUEUE=32) both build clean and
pass 21/21 host tests.

Part of Phase 8 PR-3 (docs/exec-plans/active/phase8-continued-optimization.md).

* test(video): regression for pkt_ring decoupling + document PKT_RING_SIZE knob

Adds three focused unit tests in test_media.c (under #if NANORTC_FEATURE_VIDEO)
that mirror the nano_rtc.c write-path / NACK-scan idioms against a zeroed
nanortc_t:
  - in-window lookup: recently-stamped seqs are findable.
  - out-of-window miss: after 2× ring wrap, old seqs are gone without crash.
  - independence from out_tail: pkt_ring_tail advances alone; out_tail stays
    at whatever the caller set; slot 0 holds the newest wrap's seq.

Tests pass on both default build (PKT_RING=OUT_QUEUE=32) and IoT override
(-DNANORTC_VIDEO_PKT_RING_SIZE=16 with OUT_QUEUE=32), which exercises the
wraparound path impossible to hit at the default.

Docs: add a "Video NACK ring — tunable independently of the output queue"
section to memory-profiles.md covering the macro, example override,
safety invariant (Sans-I/O caller-drain contract), and expected ~19 KB
saving at PKT_RING_SIZE=16. Tweak the Biggest Contributors and Kconfig
tables to point at the new macro.

Part of Phase 8 PR-3 (docs/exec-plans/active/phase8-continued-optimization.md).

* docs(video): fix inaccurate ~19 KB claim in PKT_RING_SIZE guidance

Copilot review on PR #54 flagged that the "override to 8 saves ~19 KB"
claim in the Kconfig help text is wrong for ESP-IDF defaults — with
OUT_QUEUE_SIZE=16 (video) the 16→8 drop at MEDIA_BUF_SIZE=1232 only
saves ~9.6 KB. The ~19 KB figure matches the host-default 32→16 case.

Rewrite the Kconfig help text around the formula
  (OUT_QUEUE_SIZE - PKT_RING_SIZE) × MEDIA_BUF_SIZE
so it stays accurate across host and ESP-IDF configurations, with a
Kconfig-correct 9.6 KB worked example.

memory-profiles.md already targets host overrides, but extend its
inline comment to spell out both host (32→16 ≈19 KB) and Kconfig
(16→8 ≈9.6 KB) numbers so readers in either context pick the right one.

Addresses #54 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(video): detect pkt_ring overrun + bump stats counter

Address PR #54 review concern: when NANORTC_VIDEO_PKT_RING_SIZE is set
below the per-frame fragment count, video_send_fragment_cb silently
wraps pkt_ring while out_queue[].transmit.data still references the
stale slots — the receiver gets newer fragment bytes carrying the older
fragment's RTP seq, manifesting as glitched IDRs. Sans-I/O drain doesn't
save you because nanortc_send_video() emits every FU-A fragment of one
access unit before returning to the caller.

Add a runtime guard: when (out_tail - out_head) >= PKT_RING_SIZE, the
ring would wrap into a slot whose pointer is still pending. Bump
nanortc_t.stats_pkt_ring_overrun (atomic, matches existing stats_*
fields) and emit a single static-string NANORTC_LOGW so under-sizing
shows up in integration smoke tests rather than as garbled keyframes
on the wire. Counter is exported on the struct for app glue to inspect.

Cannot prevent the corruption from this layer (the slot is already
about to be written), but turning silent-bad-video into a debuggable
signal matches the LOG-injection-for-AI-debug use case the rest of
the library follows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(video): aliasing regression for pkt_ring overrun detection

Mirror video_send_fragment_cb's write path (pkt_ring slot + meta stamp +
out_queue pointer enqueue) in test_media.c so the test exercises the
actual pointer-aliasing surface, not just the NACK-scan ring semantics.

Two new tests:
  - test_pkt_ring_overrun_counter_fires_when_undersized: pushes
    PKT_RING_SIZE+3 fragments without draining out_queue, asserts the
    counter trips exactly 3 times — guards the contract enforced in
    nano_rtc.c.
  - test_pkt_ring_aliasing_corruption_demonstrated_without_guard:
    documents *why* the guard exists. Without draining, out_queue[0]'s
    .data pointer still addresses pkt_ring[0], which has since been
    overwritten by the wrapped fragment. Asserts the byte at .data[0]
    is the latest writer's marker, not the original. This is the
    silent-corruption pattern the runtime guard surfaces.

Both pass under default config (PKT_RING=OUT_QUEUE=32, no overrun
because out_queue fills first) and under the IoT override
(-DNANORTC_VIDEO_PKT_RING_SIZE=8 -DNANORTC_OUT_QUEUE_SIZE=32, where the
overrun path actually fires).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(video): tighten PKT_RING_SIZE sizing guidance to per-frame fragment bound

Replace the "drain each tick" framing in Kconfig + memory-profiles.md
with the precise hard constraint:

  PKT_RING_SIZE >= ceil(max_frame_bytes / NANORTC_VIDEO_MTU) + 1

The previous wording implied the Sans-I/O drain contract was sufficient,
but nanortc_send_video() emits every FU-A fragment of one access unit
before returning to the caller — there is no opportunity to drain
mid-frame. Sizing PKT_RING_SIZE below the per-frame fragment count
silently corrupts every IDR (older fragment buffers are overwritten
while their pointers are still queued in out_queue).

Add a worked-numbers table for 480p/720p/1080p H.264 at MTU 1200, and
soften the "500 ms NACK history at 30 fps" claim — that figure only
held for single-NAL frames; multi-NAL streams cover much less wall-
clock time per slot. Cross-reference the new stats_pkt_ring_overrun
counter so integrators can wire it into smoke tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(phase8): mark PR-3 completed, record review-driven design change

Close out PR-3 in the Phase 8 exec plan. Key corrections vs the original
draft:

- The draft step 3 recommended "PKT_RING=16 at 30 fps = ~500 ms NACK
  window" as the IoT sizing rule. That claim only held for single-NAL
  frames — on 720p+ streams with multi-NAL IDRs it silently corrupted
  every keyframe because the ring would wrap while out_queue[].data
  pointers still referenced the overwritten slots. Replaced with the
  real hard constraint:
    PKT_RING_SIZE >= ceil(max_frame_bytes / NANORTC_VIDEO_MTU) + 1
  plus a worked-numbers table (480p ≈ 16, 720p ≈ 32, 1080p ≈ 64),
  landing in memory-profiles.md.
- Added a runtime overrun guard + stats_pkt_ring_overrun counter so
  under-sizing surfaces in integration smoke tests rather than as
  glitched IDRs on the wire.
- Grew the test count from the planned single test_nack_ring to five
  tests in test_media.c, including an aliasing-corruption witness
  that fails without the guard.

Also update docs/PLANS.md line 16 from "Planning" to track PR-3 and
PR-5 as landed so the phase's next-action surface is accurate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(video): clarify pkt_ring sizing rule + overrun counter wording

Address Copilot review on PR #57:

- nanortc_config.h: replace the misleading "drain each tick" framing on
  NANORTC_VIDEO_PKT_RING_SIZE with the actual hard sizing rule
  (ceil(max_frame_bytes / VIDEO_MTU) + 1). nanortc_send_video() emits
  every FU-A fragment of an access unit before returning, so callers
  cannot drain mid-frame; sizing must be based on worst-case
  fragments-per-frame, and savings depend on OUT_QUEUE_SIZE rather than
  a fixed ~19 KB number.

- Kconfig: rephrase "Set the runtime overrun counter" to "Monitor
  stats_pkt_ring_overrun" — the counter is incremented internally and
  is only readable by integrators, not user-settable.

- tests/test_media.c: rename
  test_pkt_ring_aliasing_corruption_demonstrated_without_guard to
  test_pkt_ring_aliasing_corrupts_pending_pointers_when_undersized.
  The previous name implied the helper bypassed the guard, but the
  test uses pkt_ring_simulate_send_with_guard() and the guard does
  fire — what the test actually demonstrates is that the runtime
  guard *cannot prevent* corruption once PKT_RING_SIZE is undersized;
  it only surfaces the problem via stats_pkt_ring_overrun.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(video): extract pkt_ring_alloc_slot + commit_slot helpers

The pkt_ring overrun-guard + slot-allocation pattern, plus the matching
NACK metadata stamp and pkt_ring_tail advance, were duplicated between
the H.264 zero-copy loop in rtc_send_video (added by PR #56 / fbb5b4f)
and the H.265 callback video_send_fragment_cb (added by PR #54 /
c1cb699). The verbose comment block explaining the aliasing-corruption
invariant lived only at the H.265 site, leaving the H.264 site
under-documented.

Extract:

  static uint8_t *pkt_ring_alloc_slot(nanortc_t *, uint16_t *out_pslot)
      — runs the (out_tail - out_head) >= PKT_RING_SIZE check, bumps
        stats_pkt_ring_overrun + emits NANORTC_LOGW on overrun, returns
        rtc->pkt_ring[pslot].
  static void pkt_ring_commit_slot(nanortc_t *, uint16_t pslot,
                                   uint16_t wire_seq, uint16_t srtp_len)
      — stamps pkt_ring_meta and increments pkt_ring_tail.

Both helpers live at the top of the #if NANORTC_FEATURE_VIDEO block,
ahead of the H.265 nesting, so both writers see the same definition.
The aliasing-corruption rationale now lives once at pkt_ring_alloc_slot
where both callers can find it.

Behavior identical: same guard logic, same overrun counter increment,
same slot index derivation, same meta layout. Verified across the
default + ASan + ICE_SRFLX-off + H.265 feature combos (21 / 21 host
tests, 23 / 23 with H.265 enabled, 15 / 15 ci-check.sh --fast).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(phase8): mark PR-4 completed and refresh PR-3 record

- docs/PLANS.md line 16: Phase 8 status now lists PR-4 as landed
  2026-04-25 (PR #56 / fbb5b4f), matches PR-3 / PR-5 entries.
- docs/exec-plans/active/phase8-continued-optimization.md:
  - Tag PR-4 as [COMPLETED 2026-04-25] in both the summary table and
    the section header.
  - Replace the speculative "Approach" bullets with the landed approach
    (Option C — fragment iterator), measured savings, test additions,
    and verification record.
  - PR-3 step 5: note the runtime overrun guard was helper-extracted
    into pkt_ring_alloc_slot in PR #57 and is now shared between the
    H.264 and H.265 writers.
  - PR-3 step 6: rename
    aliasing_corruption_demonstrated_without_guard to
    aliasing_corrupts_pending_pointers_when_undersized to match the
    test rename in ba32abf.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sctp): emit DISCONNECTED on retransmit exhaustion

nsctp_handle_timeout() transitions SCTP to CLOSED after exceeding
NANORTC_SCTP_MAX_RETRANSMITS but never notified the upper layer —
applications only learned about the loss on the next
nanortc_datachannel_send() failure, a silent stall from the user's
point of view. Phase 8 PR-2.

Wire the signal through:

- nano_sctp.h: new bool closed_due_to_failure on nano_sctp_t, set
  exactly once when retransmit exhaustion trips the CLOSED transition.
  Init-time CLOSED (line 480) and peer-ABORT-induced CLOSED (line 982)
  deliberately do NOT set this — they have separate signaling channels;
  rationale recorded inline.

- nano_sctp.c: set the flag alongside the existing state=CLOSED in the
  retransmit-exhaustion branch.

- nano_rtc.c: in rtc_process_timers, after the SCTP timer block, read
  and clear the flag. If the RTC was previously NANORTC_STATE_CONNECTED,
  emit NANORTC_EV_DISCONNECTED and transition to NANORTC_STATE_CLOSED.
  Mirrors the existing ICE-failure / consent-expiry paths. Critical
  detail: emit block lives outside the if (sctp.state == ESTABLISHED)
  guard — once the timeout flips state to CLOSED, that guard goes false
  on the next tick and the signal would otherwise be lost.

- tests/test_sctp.c:
  * test_sctp_timeout_sets_closed_flag (under #if NANORTC_FEATURE_DC_RELIABLE)
    walks the full RTO ramp (1000 → 2000 → 4000 → 8000 → 10000 capped)
    through MAX_RETRANSMITS cycles of nsctp_handle_timeout + nsctp_poll_output,
    asserts CLOSED + flag on the threshold-trip iteration.
  * test_sctp_abort_does_not_set_failure_flag — scope guard ensuring
    peer ABORT does not regress into the failure flag.

Verified across DC_RELIABLE=ON (test_sctp 56/56), DC_RELIABLE=OFF
(test_sctp 55/55, Test 1 correctly skipped), and scripts/ci-check.sh
--fast (15/15 incl. ASan + ICE_SRFLX=OFF + format).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(phase8): mark PR-2 completed

- docs/PLANS.md line 16: Phase 8 status now lists PR-2 (SCTP failure
  event propagation) as landed 2026-04-25, alongside the existing
  PR-3 / PR-4 / PR-5 entries; only PR-1 (IoT memory profile) remains
  pending in the recommended-order set.
- docs/exec-plans/active/phase8-continued-optimization.md:
  - Tag PR-2 as [COMPLETED 2026-04-25] in the section header.
  - Replace the speculative "Approach" bullets with the landed
    implementation: closed_due_to_failure flag, RTC emit block placed
    outside the ESTABLISHED guard (the non-obvious detail), the two
    new regression tests, and verification record (test_sctp 56/56
    with DC_RELIABLE=ON, 55/55 with DC_RELIABLE=OFF, ci-check.sh
    --fast 15/15).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ci): expand fuzz job to fuzz_h265 + fuzz_turn

Both harnesses are already registered in tests/fuzz/CMakeLists.txt
(fuzz_turn in FUZZ_CORE_TARGETS, fuzz_h265 under FUZZ_H265_TARGETS,
gated by NANORTC_FEATURE_H265). The CI fuzz job already configures
H265=ON, so adding the two harnesses to the run loop is the only
remaining step.

Closes review §4 P1-1 (docs/reviews/2026-04-26-overall-review.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(interop): add nanortc-as-TURN-client relay job

Spins up the existing tests/interop/turn-server/docker-compose.yml
coturn (via scripts/start-test-turn.sh) and runs ctest -L turn-relay
against test_interop_turn_relay_nanortc.c (5 cases). Phase 5.2 F6-F10
regressions are now caught in CI; previously they only had hand
verification on a downstream camera SDK.

Updates docs/engineering/turn-rfc-compliance.md to reflect the new
coverage status (the 'no automated test' note was stale once the
744-LOC harness landed).

Closes review §4 P1-2 + P1-5 (docs/reviews/2026-04-26-overall-review.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(rtc): defer DTLS_HANDSHAKING state until dtls_start succeeds

rtc_begin_dtls_handshake() previously set rtc->state to
DTLS_HANDSHAKING before calling dtls_start(). On dtls_start failure
the function returned early but state was already advanced —
leaving a half-initialised DTLS context that subsequent
handle_input() bytes would drive down the DTLS path. Reorder so
state is only advanced after dtls_start() succeeds; failure leaves
state at ICE_CONNECTED so the caller can retry cleanly.

Failure-path mock test deferred until a mock crypto provider lands
(no current way to make dtls_init / dtls_start fail in tests).

Closes review §4 P1-4 (docs/reviews/2026-04-26-overall-review.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(rtc): tear down DTLS context on ice_restart

Without explicit dtls_destroy() in nanortc_ice_restart(), the
"if (!rtc->dtls.crypto_ctx)" guard in rtc_begin_dtls_handshake
would skip dtls_init on the next handshake and reuse the previous
DTLS context — possible key-material reuse / orphan handshake
timers. dtls_destroy is a no-op when crypto_ctx is NULL, so calling
it unconditionally is safe even before any DTLS init.

Adds two regression tests:
- test_api_ice_restart_clears_dtls: dtls_init -> ice_restart asserts
  crypto_ctx is NULL and dtls.state is CLOSED post-restart.
- test_api_ice_restart_no_dtls_safe: ice_restart without prior
  dtls_init does not crash.

Closes review §4 P2-3 (docs/reviews/2026-04-26-overall-review.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(crc): add RFC 1952 / 3309 / 3720 vector tests

The CRC modules (nano_crc32.c / nano_crc32c.c) carried 100% line
coverage indirectly via fuzz_stun (FINGERPRINT verify) and
fuzz_sctp (chunk checksum), but had no direct algorithm-vector
test. An accidental polynomial / initial-value / final-XOR change
could pass indirect fuzz round-trip while still being
algorithmically wrong.

Adds 13 cases pinning canonical RFC vectors:
- CRC-32 ISO HDLC (RFC 1952): empty, "123456789" (0xCBF43926),
  "a" (0xE8B7BE43), "abc" (0x352441C2)
- CRC-32c Castagnoli (RFC 3309 / RFC 3720 §B.4): empty,
  "123456789" (0xE3069283), 32x0x00 (0x8A9136AA), 32x0xFF
  (0x62A8AB43), incrementing/decrementing 32-byte buffers
- Incremental API: segmented == one-shot (the 3-segment pattern
  used by nsctp_verify_checksum), zero-length update tolerance,
  byte-at-a-time fragmentation

Closes review §4 P3-1 (docs/reviews/2026-04-26-overall-review.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(tests): gate test_crc32 on DATACHANNEL

test_crc32.c calls nano_crc32c{,_init,_update,_final}, which live in
src/nano_crc32c.c and are only compiled when NANORTC_FEATURE_DATACHANNEL
is ON. Without gating, AUDIO_ONLY / MEDIA_ONLY / CORE_ONLY builds and
the architecture-constraints DC-OFF check fail to link.

Add test_crc32 to the DC_TESTS list so the existing continue() guard
skips it in DC-OFF configs, mirroring test_datachannel. The 13 CRC
assertions still run in DATA / AUDIO / MEDIA / MEDIA_H265 — RFC vectors
are configuration-invariant, so no coverage is lost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(esp-idf): add compile-only build workflow for examples

Builds esp32_datachannel across esp32 / esp32s3 / esp32p4 and
esp32_audio + esp32_video on esp32s3 via espressif/esp-idf-ci-action
on push/PR to main. Catches component / Kconfig / lwIP regressions
without requiring hardware; on-device Python E2E remains manual.

esp32_camera is intentionally excluded — its board_manager autogen +
camera-sensor selection is board-specific and will land separately.

* fix(esp32_video): pin default flash size to 4MB in sdkconfig.defaults

The custom partition table reserves a 0x3a0000 (~3.6 MB) factory app,
ending at 0x3b0000. ESP-IDF's default flash size is 2 MB, so a clean
build from sdkconfig.defaults fails partition-table generation with:

  Partitions tables occupies 3.7MB of flash (3866624 bytes) which
  does not fit in configured flash size 2MB.

The locally-tracked sdkconfig was set to 16 MB, which masked this in
manual builds; the new ESP-IDF CI workflow starts from defaults and
caught the regression.

Set CONFIG_ESPTOOLPY_FLASHSIZE_4MB=y as the minimum size that fits
the partitions; boards with larger flash override via menuconfig.

Verified: gen_esp32part.py with --flash-size 4MB succeeds; with 2MB
reproduces the CI error exactly.

* ci(esp-idf): fetch submodules in checkout for esp32_video sample data

esp32_video's main/CMakeLists.txt packs frames from
examples/sample_data/h264SampleFrames and opusSampleFrames into
video.blob / audio.blob at build time. examples/sample_data is a git
submodule (0x1abin/media_sample_data); without `submodules: recursive`,
actions/checkout@v4 leaves the path empty and the build fails:

  Error: examples/sample_data/h264SampleFrames is not a directory
  Error: examples/sample_data/opusSampleFrames is not a directory

The other matrix targets don't reference sample_data, but enabling
submodules at the workflow level keeps the step uniform; the submodule
is ~15 MB so the cost is negligible.

* fix(rtc): clear cached fingerprints on ice_restart

dtls_destroy() drops the DTLS context but leaves rtc->sdp.local_fingerprint
and rtc->dtls.local_fingerprint populated. rtc_cache_fingerprint() is a
write-once cache (early-returns when sdp.local_fingerprint is non-empty),
so the next create_offer / accept_offer after an ICE restart would
re-advertise the *previous* cert's SDP fingerprint while a fresh
dtls_init() generates a new cert and fingerprint. The peer then rejects
DTLS handshake with a fingerprint mismatch.

Memset both fingerprints to '\0' immediately after dtls_destroy() so
rtc_cache_fingerprint() repopulates them from the next dtls_init's cert.

Test: extend test_api_ice_restart_clears_dtls (T20) — pre-populate
both fingerprint buffers before calling nanortc_ice_restart and assert
they are cleared on return.

Reported by Copilot review on PR #59.

* docs(rtc): correct rtc_begin_dtls_handshake state-failure comment

The previous comment claimed that keeping state at ICE_CONNECTED after
a dtls_start() failure prevents "a half-initialised DTLS context being
driven by inbound bytes". That is misleading: the inbound DTLS gate in
handle_input is `state >= NANORTC_STATE_ICE_CONNECTED`, so DTLS records
are still accepted on the retry path. The state ordering only governs
the FSM; the actual receive-path policy is unchanged.

Reword to describe what the reorder actually achieves (FSM stays at
ICE_CONNECTED so callers can retry DTLS startup) and explicitly call
out that the inbound gate is unchanged, so future readers don't infer
a stronger guarantee than exists.

Reported by Copilot review on PR #59.

* test(rtc): add e2e regression tests for ice_restart hardening

Brackets the unit-level T20/T21 coverage in tests/test_trickle_ice.c
with a full ICE -> DTLS round trip on both sides. Existing unit tests
probe nanortc_ice_restart() against a hand-built nanortc_t and assert
which fields got reset; nothing today drives the end-to-end path of
two peers reaching DTLS_CONNECTED, restarting, and re-handshaking
with fresh keying material.

- e2e_resync_ice_creds() helper: counterpart to e2e_setup_ice_creds()
  that re-syncs ufrag/pwd and re-arms remote_candidates after both
  peers have called nanortc_ice_restart(). ice_restart() preserves
  local_candidates (RFC 8445 §9.1.1.1) but blanks the peer side.

- test_e2e_ice_restart_dtls_rehandshake: drives both peers to
  DTLS_CONNECTED, captures fingerprints, restarts, re-syncs, pumps
  to reconnect, asserts the new cert hash differs from pre-restart.
  Catches a regression of 8fea1e8 (without dtls_destroy in
  nanortc_ice_restart, the second handshake skips dtls_init via
  the !crypto_ctx guard and reuses the original cert).

- test_e2e_ice_restart_sdp_fingerprint_refresh: drives create_offer
  twice across an ice_restart, asserts sdp.local_fingerprint re-syncs
  to the freshly generated dtls.local_fingerprint. Catches a regression
  of 9a75412 (without the fingerprint memset, rtc_cache_fingerprint
  early-returns on the stale cache and the SDP cache holds the old
  cert hash while DTLS holds the new one).

Validation: each new test was confirmed to fail when its target
commit is surgically reverted in the working tree, and to pass on
baseline develop. ./scripts/ci-check.sh --fast 15/15 green; full
host suite 22/22.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(phase8): record post-PR-5 ice_restart hardening as TD-022

The post-PR-5 follow-up commits surfaced by the Copilot review on
PR #59 (d59a528 / 8fea1e8 / 9a75412 / af8f566) were not tracked
anywhere — TD-021 closed the original ICE hardening pass and
phase8 PR-5 was already marked completed.

Add a consolidated TD-022 entry naming each commit, the bug it
fixes, and the test that pins it (the unit T20/T21 sites plus the
two new e2e tests). Add a "Post-merge hardening" subsection at the
tail of phase8 PR-5 with a commit -> failing-test mapping table so
future readers can connect the breadcrumb trail without crawling
git log.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(h265): wire receive-path demux + first H.265 video e2e

PR-2 stage B closes the receive half of the H.265 wiring so two
nanortc instances can roundtrip H.265 end-to-end:

- Per-track depacketizer becomes a codec-selected union
  (track.video.depkt.{h264,h265}) in src/nano_media.h; the H.265
  arm is conditional on NANORTC_FEATURE_H265 so OFF builds keep the
  prior sizeof(nanortc_track_t).
- track_init() dispatches on m->codec to call h265_depkt_init() for
  H.265 video tracks, else falls back to h264_depkt_init().
- nano_rtc_media_handle_rtp_or_rtcp() RTP demux dispatches to
  h265_depkt_push() + h265_is_keyframe() when the track is H.265;
  the single NANORTC_EV_MEDIA_DATA emit site is shared because the
  event payload is codec-neutral.

Tests:
- test_h265_track_init_dispatches_to_h265_depkt drives the same FU
  three-fragment vector through track_init + the union accessor as
  the standalone module test, proving dispatch path == direct path.
- test_e2e_h265_loopback is the project's first true two-instance
  video e2e: full SDP/ICE/DTLS, one IDR Annex-B AU sent, answerer
  surfaces byte-identical payload with is_keyframe asserted.

Stage A (SDP plumbing + send path) shipped 2026-04-16 during the
browser interop hardening session; phase3-5-h265.md now records the
staged-landing narrative and checks all PR-2 acceptance criteria.
PR-3 (libdatachannel interop + browser + doc finalization) pending.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants