Background
`scripts/build_backgrounds_chrombpnet.py` writes `chrombpnet_effect_cdfs_interim.npz` and `chrombpnet_baseline_cdfs_interim.npz` via plain `np.savez_compressed(interim_path, ...)` — which overwrites whatever was there. So the documented two-pass build flow:
```bash
python scripts/build_backgrounds_chrombpnet.py --part variants --assay ATAC_DNASE ...
python scripts/build_backgrounds_chrombpnet.py --part variants --assay CHIP ...
```
…silently loses the 42 ATAC/DNase tracks: the second invocation replaces the interim with 744 CHIP tracks. The subsequent `--part merge` only sees the 744 and produces a 744-track final NPZ instead of the expected 786.
Reproduction
This happened during the 2026-04-30 chrombpnet CDF rebuild (PR #70). Following the handoff at `audits/2026-04-29_chrombpnet_cdf_rebuild/HANDOFF.md` verbatim produced a 744-track final NPZ; the post-merge spot-check caught the 744 ≠ 786 mismatch. Recovered by re-running Phase 1 with the existing 744-CHIP NPZ in place and finishing with `--part merge-incremental` (~50 min cost).
Suggested fix
Two reasonable options:
- Append-on-write within an interim: read the existing interim if present, merge its track rows with the new run's rows, save back. Mirrors how `merge-incremental` already handles the final NPZ via `PerTrackNormalizer.append_tracks`.
- Refuse to overwrite without --force: detect existing interim, error with a clear message ("found interim from a prior run with different tracks; pass --force to overwrite, or run `--part merge` to consume it first").
Option 1 makes the documented two-pass flow Just Work; option 2 is more conservative but at least avoids silent data loss.
Why it matters
Anyone following the handoff's documented invocations as-is loses data without realising it. The error mode is silent — the script writes a smaller file successfully and the merge "succeeds" with the wrong track count. Only a track-count check at the end of merge surfaces it.
Related
Background
`scripts/build_backgrounds_chrombpnet.py` writes `chrombpnet_effect_cdfs_interim.npz` and `chrombpnet_baseline_cdfs_interim.npz` via plain `np.savez_compressed(interim_path, ...)` — which overwrites whatever was there. So the documented two-pass build flow:
```bash
python scripts/build_backgrounds_chrombpnet.py --part variants --assay ATAC_DNASE ...
python scripts/build_backgrounds_chrombpnet.py --part variants --assay CHIP ...
```
…silently loses the 42 ATAC/DNase tracks: the second invocation replaces the interim with 744 CHIP tracks. The subsequent `--part merge` only sees the 744 and produces a 744-track final NPZ instead of the expected 786.
Reproduction
This happened during the 2026-04-30 chrombpnet CDF rebuild (PR #70). Following the handoff at `audits/2026-04-29_chrombpnet_cdf_rebuild/HANDOFF.md` verbatim produced a 744-track final NPZ; the post-merge spot-check caught the 744 ≠ 786 mismatch. Recovered by re-running Phase 1 with the existing 744-CHIP NPZ in place and finishing with `--part merge-incremental` (~50 min cost).
Suggested fix
Two reasonable options:
Option 1 makes the documented two-pass flow Just Work; option 2 is more conservative but at least avoids silent data loss.
Why it matters
Anyone following the handoff's documented invocations as-is loses data without realising it. The error mode is silent — the script writes a smaller file successfully and the merge "succeeds" with the wrong track count. Only a track-count check at the end of merge surfaces it.
Related