fix(cdf-rebuild): protect interim NPZ + honour CUDA_VISIBLE_DEVICES; streamline README top#76
Merged
lucapinello merged 4 commits intomainfrom Apr 30, 2026
Conversation
…streamline README top Three changes from the 0.4.0 follow-up triage. Single PR because the HANDOFF.md tweak depends on both script fixes. ## Bug A (P1, #71/#73): refuse-to-overwrite the interim NPZ `scripts/build_backgrounds_chrombpnet.py` was writing the interim NPZ unconditionally with `np.savez_compressed(interim_path, ...)`. The documented two-pass flow (`--assay ATAC_DNASE` then `--assay CHIP`) silently overwrote the first pass's 42-track interim with the second pass's 744 CHIP tracks, producing a 744-track final NPZ instead of the expected 786 — caught by post-merge spot-check during PR #70 (~50 min GPU recovery). Conservative fix per the user's pick: refuse to overwrite without `--force`, naming the conflicting track-id sets in the SystemExit message. The new `_check_interim_compatibility()` helper runs before each interim write site and: - returns silently if the path doesn't exist (first run); - returns silently if the existing track set equals the new set (idempotent re-run with no data loss); - raises SystemExit with a diff naming `len(only_existing)`, `len(only_new)`, and 3 example track-ids from each side, plus pointing at `--part merge` / `merge-incremental` / `--force`; - returns silently with `--force`. ## Bug B (P2, #72/#74): honour pre-set CUDA_VISIBLE_DEVICES `load_models_and_setup()` was clobbering `os.environ["CUDA_VISIBLE_DEVICES"]` with `--gpu N` unconditionally, so the documented parallel-launch pattern (outer `CUDA_VISIBLE_DEVICES=N` per terminal pinning the physical GPU) didn't work — both terminals landed on physical GPU 0, fighting for memory until one OOMed. Trivial pattern per the user's pick: only set the env var when nothing was already set: if "CUDA_VISIBLE_DEVICES" not in os.environ: os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpu) Plus a one-line `--help` clarification on `--gpu`: "No-op if CUDA_VISIBLE_DEVICES is already set in the calling shell." ## README streamlining User feedback: the first two sections of "🚀 Get running in one lunch break" were too dense for a first-timer. Specifically the ~150-word Prerequisite paragraph after Step 1 (Miniforge, ~28 GB, both AlphaGenome backends, ChromBPNet streaming, lazy downloads, --all-chrombpnet, ENCODE fallback) and the 5-line Step 2 blockquote duplicating backend detail. Restructure (top section now is just): four-step intro, "Before you start" mini-section with three bullets (Miniforge link, ~28 GB, platforms), Steps 1-4 each terse and copy-pasteable. The detail moves to existing deeper sections — the Step 2 blockquote was already duplicated in the deeper "Two AlphaGenome backends" subsection (same content, more detail), so just drops; the Prerequisite paragraph's disk-usage detail moves into a new `#### Disk usage breakdown` subsection at the top of `Installation — detailed`. Anchor links (`#disk-usage-breakdown`, `#two-alphagenome-backends`, `#where-the-oracle-weights-come-from`) all resolve. ## HANDOFF.md Adds two short notes so future maintainers don't trip over the new behaviour: a callout between Phase 1 and Phase 2 reminding to run `--part merge` (or pass `--force`) before re-running with a different `--assay`; and a one-line note in the parallel-launch section explaining the `CUDA_VISIBLE_DEVICES` precedence. ## Tests `pytest -m "not integration and not slow"` → 368 passed, 1 skipped, 5 deselected — same count as main, no regression. Bug A's SystemExit-vs-overwrite branches are exercised by the helper's pure function; a smoke test that builds two tiny interims and asserts the diff message is left as a follow-up (would need conda-env infra in CI to actually run the script — out of scope for this PR). Closes #71, #72, #73, #74. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User feedback as part of the streamlining pass. The callout was an escape hatch into a one-oracle-only install for impatient readers, but it sits between Step 2 (the canonical full setup) and Step 3 (the runnable snippet that uses Enformer anyway), and a first-time reader following the linear flow doesn't need it interrupting the narrative. Anyone who specifically wants the lightweight starter will find `chorus setup --oracle <name>` documented in `Installation — detailed → Setting up oracle environments one-by-one` (already present, unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
User feedback on the streamlined top:
Step 2 — title was a heavy spec ("Download all 6 oracles + hg38 +
backgrounds"); replaced with "Get every oracle, weight, and
reference — batteries included". Body trimmed of the
multi-GB-tarball / CDF detail that new users don't care about.
Step 3 — title was utilitarian ("Predict — wild-type + SNP effect
in one block"); replaced with "Your first prediction — score a SNP
at the β-globin locus". Added two intro sentences above the code
block explaining what the snippet does and why this prediction
shape (one wild-type signal + N counter-factual variants) is the
canonical chorus pattern.
Step 4 — biggest reorder. Old version led with "ships an MCP
server with 22 tools, here's the full list" before any natural-
language example. New version flips that: lead with one bash
command + three concrete prompts the user can paste into Claude
Code, then mention the 22-tool catalogue at the end as the deeper
read. Title rewritten to convey "complex analyses without coding"
("Skip the code — drive chorus from Claude in plain English").
What-to-read-next reordered so the first two bullets are the
discovery/exploration paths (Notebooks, Worked application
examples — both prompt-driven) instead of the API recipes. API
slipped to fourth.
No code or anchor changes; all internal links still resolve.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tebooks + MCP intros
Second-pass review. The audit was: where does the reader's interest
flag and why? Five spots fixed:
What chorus is — was three redundant bullet lists. The cover tagline
already names the six oracles, and the "Pick an oracle" table right
below has the per-oracle stats. The "Key features" sub-bullet list
duplicated the lunch-break tour AND the later Key features section.
Replaced with two short paragraphs that say what the cover doesn't:
percentile-grounded outputs (`+0.45 log2FC → 0.962 effect %ile`), per-
oracle conda isolation, chorus-controlled HF mirrors, 22-tool MCP.
Worked application examples — title was a heading, now leads. New
title: "Worked application examples — seven things you can do today."
Intro now opens with the magic ("every example was generated end-to-
end by Claude Code talking to chorus's MCP server, no code written by
hand") instead of burying it.
Notebooks — was "Three notebooks are provided, from introductory to
advanced". Replaced with "Three sittings, zero to confident" + a
sentence describing the user's actual progression. Per-notebook
descriptions rewritten in second-person ("what you'll build") with
plain-English summaries — the "I get it now" notebook, the "graduate-
level" notebook.
Pick an oracle — moved the wall-of-prose paragraph about CUDA / Apple
Metal / tensorflow-metal / PyTorch MPS / JAX-Metal-falls-back-to-CPU
out of "Pick an oracle". Replaced with a one-line "GPU detection is
automatic" pointer to a new Platform & GPU support table inside
Installation — detailed, where someone actually deciding what to
install can find it.
MCP server — was "Chorus includes an MCP server that lets AI
assistants like Claude directly load oracles, predict variant effects,
and analyze gene expression — all through natural language
conversation." Lukewarm. Replaced with a one-liner that ties back to
Step 4 of the lunch-break tour (which the reader just finished, and
which is what hyped them in the first place) and previews what's in
the rest of the section.
Net diff: -7 LOC. Doc is shorter, less redundant, and punches harder
where it matters.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 30, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix(cdf-rebuild): protect interim NPZ + honour
CUDA_VISIBLE_DEVICES; streamline README topThree changes from the 0.4.0 follow-up triage. Single PR because the HANDOFF.md edits depend on both script fixes.
Closes
build_backgrounds_chrombpnet.py:--part variants/baselines --assay Xoverwrites the existing interim NPZ instead of appending (silent ~10 h GPU loss caught only by post-merge spot-check).build_backgrounds_chrombpnet.py:--gpu Nsilently overrides outerCUDA_VISIBLE_DEVICES, breaking the parallel-launch pattern.#73/#74are exact duplicates of#71/#72— opened ~5 min apart by concurrent agent triage. Closing both pairs.Bug A (P1) — refuse-to-overwrite the interim NPZ
User's pick was the conservative fix: refuse to overwrite without
--force, naming the conflicting track-id sets.--forceargparse flag._check_interim_compatibility(interim_path, new_track_ids, force, label)helper, called before eachnp.savez_compressed(interim_path, ...). Returns silently when the file doesn't exist (first run) or when the existing track set equals the new set (idempotent re-run); otherwise raisesSystemExitwith a diff (len(only_existing),len(only_new), three example track-ids per side) and a pointer at--part merge/--part merge-incremental/--force.raise SystemExit("...")validation-error convention in the file (lines 245, 247).Append-on-write (#71's option (a)) was deferred — better ergonomics but a v0.5+ improvement.
Bug B (P2) — honour pre-set
CUDA_VISIBLE_DEVICESUser's pick was the trivial pattern. 3 LOC:
Plus a one-line
--helpclarification on--gpu: "No-op ifCUDA_VISIBLE_DEVICESis already set in the calling shell."When the env var is set, the outer wins — matches every cluster scheduler's precedence convention.
README streamlining
Reader feedback was that the first two sections of "🚀 Get running in one lunch break" were too dense for a first-timer:
--all-chrombpnet, ENCODE fallback)#### Two AlphaGenome backendssectionRestructure:
chorus setup+ the existing token / 2-min options#### Disk usage breakdownsubsection at the top ofInstallation — detailedwith a per-bucket tableAnchor links (
#disk-usage-breakdown,#two-alphagenome-backends,#where-the-oracle-weights-come-from) all resolve. The detail isn't lost — just moved to the deeper section that's already documented as the home for "edge cases: upgrading, per-oracle setup, token plumbing, manual genome management, and backgrounds."The first screen now reads as: title → tagline → four-step intro → three-bullet "Before you start" → install commands. No prose-walls, no premature backend complexity.
HANDOFF.md (
audits/2026-04-29_chrombpnet_cdf_rebuild/)Two short notes so future maintainers don't re-trip over the same bugs:
--part merge(ormerge-incremental, or pass--force) before re-running with a different--assay.CUDA_VISIBLE_DEVICES/--gpuprecedence.Test plan
--helpparses cleanly on the modified script. Both new flags visible (--force, updated--gpuhelp-text).pytest tests/ -q -m "not integration and not slow"→ 368 passed, 1 skipped, 5 deselected (same count asmain— no regression).#disk-usage-breakdown→ new line 199;#two-alphagenome-backends→ existing line 173;#where-the-oracle-weights-come-from→ existing line 310.Other ChromBPNet cell types download lazilyand5–8× faster than JAX CPUdistinctive strings appear exactly once each (in their new homes).--forcesmoke: would need conda-env infra in CI to actually invoke the script. Helper is a pure function over the existing track-id reservoir code — verified by inspection. Follow-up tracked at the file-level next time someone runs a partial CDF rebuild on an A100 box.🤖 Generated with Claude Code