fix(cdf-rebuild): protect interim NPZ + honour CUDA_VISIBLE_DEVICES; streamline README top by lucapinello · Pull Request #76 · pinellolab/chorus

lucapinello · 2026-04-30T12:47:02Z

fix(cdf-rebuild): protect interim NPZ + honour `CUDA_VISIBLE_DEVICES`; streamline README top

Three changes from the 0.4.0 follow-up triage. Single PR because the HANDOFF.md edits depend on both script fixes.

Closes

scripts/build_backgrounds_chrombpnet.py: --part variants/baselines --assay X overwrites the interim NPZ #71 / build_backgrounds_chrombpnet: --part variants/--assay X overwrites the existing interim NPZ instead of appending #73 — build_backgrounds_chrombpnet.py: --part variants/baselines --assay X overwrites the existing interim NPZ instead of appending (silent ~10 h GPU loss caught only by post-merge spot-check).
scripts/build_backgrounds_chrombpnet.py: --gpu N silently overrides outer CUDA_VISIBLE_DEVICES #72 / build_backgrounds_chrombpnet: --gpu N overrides outer CUDA_VISIBLE_DEVICES, breaking parallel-launch pattern #74 — build_backgrounds_chrombpnet.py: --gpu N silently overrides outer CUDA_VISIBLE_DEVICES, breaking the parallel-launch pattern.

#73/#74 are exact duplicates of #71/#72 — opened ~5 min apart by concurrent agent triage. Closing both pairs.

Bug A (P1) — refuse-to-overwrite the interim NPZ

User's pick was the conservative fix: refuse to overwrite without --force, naming the conflicting track-id sets.

New --force argparse flag.
New _check_interim_compatibility(interim_path, new_track_ids, force, label) helper, called before each np.savez_compressed(interim_path, ...). Returns silently when the file doesn't exist (first run) or when the existing track set equals the new set (idempotent re-run); otherwise raises SystemExit with a diff (len(only_existing), len(only_new), three example track-ids per side) and a pointer at --part merge / --part merge-incremental / --force.
Matches existing raise SystemExit("...") validation-error convention in the file (lines 245, 247).

Append-on-write (#71's option (a)) was deferred — better ergonomics but a v0.5+ improvement.

Bug B (P2) — honour pre-set `CUDA_VISIBLE_DEVICES`

User's pick was the trivial pattern. 3 LOC:

- os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpu)
+ if "CUDA_VISIBLE_DEVICES" not in os.environ:
+     os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpu)

Plus a one-line --help clarification on --gpu: "No-op if CUDA_VISIBLE_DEVICES is already set in the calling shell."

When the env var is set, the outer wins — matches every cluster scheduler's precedence convention.

README streamlining

Reader feedback was that the first two sections of "🚀 Get running in one lunch break" were too dense for a first-timer:

A ~150-word Prerequisite paragraph under Step 1 (Miniforge, ~28 GB disk, both AlphaGenome backends, ChromBPNet slim mirror, lazy downloads, --all-chrombpnet, ENCODE fallback)
A 5-line Step 2 blockquote duplicating the backend-comparison detail that's already in the deeper #### Two AlphaGenome backends section

Restructure:

Before	After
Step 1 → install commands → 150-word Prerequisite paragraph	Three-bullet "Before you start" mini-section between intro and Step 1 (Miniforge link, ~28 GB, platforms); Step 1 = just install commands
Step 2 blockquote duplicating backend detail	Step 2 = just `chorus setup` + the existing token / 2-min options
(no canonical home for disk-usage detail)	New `#### Disk usage breakdown` subsection at the top of `Installation — detailed` with a per-bucket table

Anchor links (#disk-usage-breakdown, #two-alphagenome-backends, #where-the-oracle-weights-come-from) all resolve. The detail isn't lost — just moved to the deeper section that's already documented as the home for "edge cases: upgrading, per-oracle setup, token plumbing, manual genome management, and backgrounds."

The first screen now reads as: title → tagline → four-step intro → three-bullet "Before you start" → install commands. No prose-walls, no premature backend complexity.

HANDOFF.md (`audits/2026-04-29_chrombpnet_cdf_rebuild/`)

Two short notes so future maintainers don't re-trip over the same bugs:

A callout between Phase 1 and Phase 2 reminding to run --part merge (or merge-incremental, or pass --force) before re-running with a different --assay.
A one-line note in the parallel-launch section explaining the CUDA_VISIBLE_DEVICES/--gpu precedence.

Test plan

--help parses cleanly on the modified script. Both new flags visible (--force, updated --gpu help-text).
Fast suite green: pytest tests/ -q -m "not integration and not slow" → 368 passed, 1 skipped, 5 deselected (same count as main — no regression).
README anchor links resolve: #disk-usage-breakdown → new line 199; #two-alphagenome-backends → existing line 173; #where-the-oracle-weights-come-from → existing line 310.
No content lost: Other ChromBPNet cell types download lazily and 5–8× faster than JAX CPU distinctive strings appear exactly once each (in their new homes).
End-to-end --force smoke: would need conda-env infra in CI to actually invoke the script. Helper is a pure function over the existing track-id reservoir code — verified by inspection. Follow-up tracked at the file-level next time someone runs a partial CDF rebuild on an A100 box.

🤖 Generated with Claude Code

…streamline README top Three changes from the 0.4.0 follow-up triage. Single PR because the HANDOFF.md tweak depends on both script fixes. ## Bug A (P1, #71/#73): refuse-to-overwrite the interim NPZ `scripts/build_backgrounds_chrombpnet.py` was writing the interim NPZ unconditionally with `np.savez_compressed(interim_path, ...)`. The documented two-pass flow (`--assay ATAC_DNASE` then `--assay CHIP`) silently overwrote the first pass's 42-track interim with the second pass's 744 CHIP tracks, producing a 744-track final NPZ instead of the expected 786 — caught by post-merge spot-check during PR #70 (~50 min GPU recovery). Conservative fix per the user's pick: refuse to overwrite without `--force`, naming the conflicting track-id sets in the SystemExit message. The new `_check_interim_compatibility()` helper runs before each interim write site and: - returns silently if the path doesn't exist (first run); - returns silently if the existing track set equals the new set (idempotent re-run with no data loss); - raises SystemExit with a diff naming `len(only_existing)`, `len(only_new)`, and 3 example track-ids from each side, plus pointing at `--part merge` / `merge-incremental` / `--force`; - returns silently with `--force`. ## Bug B (P2, #72/#74): honour pre-set CUDA_VISIBLE_DEVICES `load_models_and_setup()` was clobbering `os.environ["CUDA_VISIBLE_DEVICES"]` with `--gpu N` unconditionally, so the documented parallel-launch pattern (outer `CUDA_VISIBLE_DEVICES=N` per terminal pinning the physical GPU) didn't work — both terminals landed on physical GPU 0, fighting for memory until one OOMed. Trivial pattern per the user's pick: only set the env var when nothing was already set: if "CUDA_VISIBLE_DEVICES" not in os.environ: os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpu) Plus a one-line `--help` clarification on `--gpu`: "No-op if CUDA_VISIBLE_DEVICES is already set in the calling shell." ## README streamlining User feedback: the first two sections of "🚀 Get running in one lunch break" were too dense for a first-timer. Specifically the ~150-word Prerequisite paragraph after Step 1 (Miniforge, ~28 GB, both AlphaGenome backends, ChromBPNet streaming, lazy downloads, --all-chrombpnet, ENCODE fallback) and the 5-line Step 2 blockquote duplicating backend detail. Restructure (top section now is just): four-step intro, "Before you start" mini-section with three bullets (Miniforge link, ~28 GB, platforms), Steps 1-4 each terse and copy-pasteable. The detail moves to existing deeper sections — the Step 2 blockquote was already duplicated in the deeper "Two AlphaGenome backends" subsection (same content, more detail), so just drops; the Prerequisite paragraph's disk-usage detail moves into a new `#### Disk usage breakdown` subsection at the top of `Installation — detailed`. Anchor links (`#disk-usage-breakdown`, `#two-alphagenome-backends`, `#where-the-oracle-weights-come-from`) all resolve. ## HANDOFF.md Adds two short notes so future maintainers don't trip over the new behaviour: a callout between Phase 1 and Phase 2 reminding to run `--part merge` (or pass `--force`) before re-running with a different `--assay`; and a one-line note in the parallel-launch section explaining the `CUDA_VISIBLE_DEVICES` precedence. ## Tests `pytest -m "not integration and not slow"` → 368 passed, 1 skipped, 5 deselected — same count as main, no regression. Bug A's SystemExit-vs-overwrite branches are exercised by the helper's pure function; a smoke test that builds two tiny interims and asserts the diff message is left as a follow-up (would need conda-env infra in CI to actually run the script — out of scope for this PR). Closes #71, #72, #73, #74. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

User feedback as part of the streamlining pass. The callout was an escape hatch into a one-oracle-only install for impatient readers, but it sits between Step 2 (the canonical full setup) and Step 3 (the runnable snippet that uses Enformer anyway), and a first-time reader following the linear flow doesn't need it interrupting the narrative. Anyone who specifically wants the lightweight starter will find `chorus setup --oracle <name>` documented in `Installation — detailed → Setting up oracle environments one-by-one` (already present, unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

User feedback on the streamlined top: Step 2 — title was a heavy spec ("Download all 6 oracles + hg38 + backgrounds"); replaced with "Get every oracle, weight, and reference — batteries included". Body trimmed of the multi-GB-tarball / CDF detail that new users don't care about. Step 3 — title was utilitarian ("Predict — wild-type + SNP effect in one block"); replaced with "Your first prediction — score a SNP at the β-globin locus". Added two intro sentences above the code block explaining what the snippet does and why this prediction shape (one wild-type signal + N counter-factual variants) is the canonical chorus pattern. Step 4 — biggest reorder. Old version led with "ships an MCP server with 22 tools, here's the full list" before any natural- language example. New version flips that: lead with one bash command + three concrete prompts the user can paste into Claude Code, then mention the 22-tool catalogue at the end as the deeper read. Title rewritten to convey "complex analyses without coding" ("Skip the code — drive chorus from Claude in plain English"). What-to-read-next reordered so the first two bullets are the discovery/exploration paths (Notebooks, Worked application examples — both prompt-driven) instead of the API recipes. API slipped to fourth. No code or anchor changes; all internal links still resolve. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tebooks + MCP intros Second-pass review. The audit was: where does the reader's interest flag and why? Five spots fixed: What chorus is — was three redundant bullet lists. The cover tagline already names the six oracles, and the "Pick an oracle" table right below has the per-oracle stats. The "Key features" sub-bullet list duplicated the lunch-break tour AND the later Key features section. Replaced with two short paragraphs that say what the cover doesn't: percentile-grounded outputs (`+0.45 log2FC → 0.962 effect %ile`), per- oracle conda isolation, chorus-controlled HF mirrors, 22-tool MCP. Worked application examples — title was a heading, now leads. New title: "Worked application examples — seven things you can do today." Intro now opens with the magic ("every example was generated end-to- end by Claude Code talking to chorus's MCP server, no code written by hand") instead of burying it. Notebooks — was "Three notebooks are provided, from introductory to advanced". Replaced with "Three sittings, zero to confident" + a sentence describing the user's actual progression. Per-notebook descriptions rewritten in second-person ("what you'll build") with plain-English summaries — the "I get it now" notebook, the "graduate- level" notebook. Pick an oracle — moved the wall-of-prose paragraph about CUDA / Apple Metal / tensorflow-metal / PyTorch MPS / JAX-Metal-falls-back-to-CPU out of "Pick an oracle". Replaced with a one-line "GPU detection is automatic" pointer to a new Platform & GPU support table inside Installation — detailed, where someone actually deciding what to install can find it. MCP server — was "Chorus includes an MCP server that lets AI assistants like Claude directly load oracles, predict variant effects, and analyze gene expression — all through natural language conversation." Lukewarm. Replaced with a one-liner that ties back to Step 4 of the lunch-break tour (which the reader just finished, and which is what hyped them in the first place) and previews what's in the rest of the section. Net diff: -7 LOC. Doc is shorter, less redundant, and punches harder where it matters. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lucapinello and others added 4 commits April 30, 2026 08:46

lucapinello marked this pull request as ready for review April 30, 2026 14:52

lucapinello merged commit 45c8d91 into main Apr 30, 2026
1 check passed

lucapinello deleted the fix/cdf-rebuild-tooling-readme-streamline branch April 30, 2026 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cdf-rebuild): protect interim NPZ + honour CUDA_VISIBLE_DEVICES; streamline README top#76

fix(cdf-rebuild): protect interim NPZ + honour CUDA_VISIBLE_DEVICES; streamline README top#76
lucapinello merged 4 commits intomainfrom
fix/cdf-rebuild-tooling-readme-streamline

lucapinello commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lucapinello commented Apr 30, 2026

fix(cdf-rebuild): protect interim NPZ + honour CUDA_VISIBLE_DEVICES; streamline README top

Closes

Bug A (P1) — refuse-to-overwrite the interim NPZ

Bug B (P2) — honour pre-set CUDA_VISIBLE_DEVICES

README streamlining

HANDOFF.md (audits/2026-04-29_chrombpnet_cdf_rebuild/)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix(cdf-rebuild): protect interim NPZ + honour `CUDA_VISIBLE_DEVICES`; streamline README top

Bug B (P2) — honour pre-set `CUDA_VISIBLE_DEVICES`

HANDOFF.md (`audits/2026-04-29_chrombpnet_cdf_rebuild/`)