Skip to content

fix/feat: CHIP normalization, setup timeout, cleanup command, genome race fix#78

Open
lucapinello wants to merge 4 commits intomainfrom
fix/post-v040-followups
Open

fix/feat: CHIP normalization, setup timeout, cleanup command, genome race fix#78
lucapinello wants to merge 4 commits intomainfrom
fix/post-v040-followups

Conversation

@lucapinello
Copy link
Copy Markdown
Contributor

Summary

Four items from post-v0.4.0 user feedback, found/validated by a full scorched-earth teardown+reinstall cycle (cleanup → rebuild all 7 oracles → 375 tests pass).

  • CHIP strand mismatch — ChromBPNet CHIP predictions emit CHIP:K562:REST:+/:- but the background CDF stores CHIP:K562:REST. All CHIP normalization lookups silently fell back to raw unscaled values in IGV reports and percentile output. Fixed in PerTrackNormalizer with a strand-strip fallback in _lookup, _lookup_batch, and perbin_floor_rescale_batch. 8 new unit tests.
  • alphagenome_pt CDF alias — both backends produce identical predictions so alphagenome_pt now transparently reuses the alphagenome CDF. No separate NPZ upload needed; bypassed automatically if a dedicated file appears later.
  • chorus setup --setup-timeout SECONDS — caps both the mamba env create subprocess and the weight download phase. Addresses slow/unstable connections on remote lab servers. Default: unlimited (no behaviour change).
  • chorus cleanup command — removes conda envs, downloaded weights, background CDFs, and genomes in one command. Supports --oracle {name|all}, --backgrounds, --genomes, --all, --dry-run.
  • Genome concurrent decompression racechorus setup --oracle all (or two parallel setups) could hit a race where one process decompressed and deleted hg38.fa.gz while another was about to open it → FileNotFoundError. Fixed with existence checks and missing_ok=True.
  • Stale ChromBPNet health probe_probe_chrombpnet checked for CHORUS_DOWNLOADS_DIR/chrombpnet/DNASE_K562 (old ENCODE tarball path). Since v0.3 the default path uses the HF slim mirror into ~/.cache/huggingface/, so chorus health always reported chrombpnet as "Not installed" after a clean setup. Switched to _probe_library_cached.
  • README — Upgrading section updated to use chorus cleanup --all; new Uninstalling subsection; --setup-timeout usage note.

Test plan

  • pytest tests/test_normalization_chip_strand.py -v — 7 passed
  • pytest tests/ -m "not integration" -q — 375 passed, 1 skipped (from clean install)
  • chorus cleanup --all --dry-run — correct output, no deletions
  • chorus cleanup --all — removed all 7 envs, weights, backgrounds, genome
  • chorus setup --oracle all — rebuilt 7/7 oracles, all Healthy
  • chorus health — 7/7 ✓ Healthy including chrombpnet
  • Timeout kill path verified: threading.Timer fires at exactly N seconds, raises descriptive RuntimeError

🤖 Generated with Claude Code

lp698 and others added 4 commits May 6, 2026 11:21
…F alias

ChromBPNet CHIP predictions emit track IDs like `CHIP:K562:REST:+`/`:-`
but the background CDF stores `CHIP:K562:REST` (no strand). All CHIP
normalization lookups silently returned None, falling back to raw
unscaled values in IGV reports and percentile output.

Fix: strip `:+`/`:-` suffix as a fallback in `_lookup`, `_lookup_batch`,
and `perbin_floor_rescale_batch` in `PerTrackNormalizer`. Both strand
predictions correctly share the single background distribution row.

Also alias `alphagenome_pt` → `alphagenome` in `_ensure_loaded` and
`get_pertrack_normalizer`: since both backends produce identical
predictions (same model + weights), no separate CDF file is needed.
The alias is bypassed automatically if a dedicated
`alphagenome_pt_pertrack.npz` appears in the cache later.

Adds 8 unit tests covering strand-suffixed lookups and the strandless
key fallthrough.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`chorus setup --oracle X` previously had no timeout for either the
`mamba env create` subprocess or the weight download phase. On slow or
unstable connections (e.g. remote lab servers) a stalled download would
hang indefinitely.

Changes:
- `chorus/cli/main.py`: add `--setup-timeout SECONDS` flag (default:
  unlimited). Passes through to both `create_environment()` and
  `prefetch_for_oracle()`.
- `chorus/core/environment/manager.py`: `create_environment()` gains a
  `timeout` parameter. A `threading.Timer` kills the `mamba env create`
  subprocess after N seconds and raises a descriptive `RuntimeError`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Uninstalling chorus previously required manually removing 7 conda envs,
downloaded weights, background CDFs, and reference genomes. `chorus cleanup`
handles this in one command.

Usage:
  chorus cleanup --oracle {name|all}   # env + weights
  chorus cleanup --backgrounds         # ~/.chorus/backgrounds/*.npz
  chorus cleanup --genomes             # downloaded reference genomes
  chorus cleanup --all                 # everything above
  chorus cleanup --all --dry-run       # preview without deleting

- Missing paths silently skipped (idempotent)
- Dry-run prints [DRY RUN] prefix on every action
- Summary line at end: "Removed N environment(s), M weight dir(s), K file(s)"
- README: Upgrading section updated to use `chorus cleanup --all`;
  new Uninstalling subsection added; `--setup-timeout` usage note added

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…robe

Two bugs found during scorched-earth teardown/reinstall test:

1. `chorus/utils/genome.py`: `download_with_resume` releases its lock
   after writing `hg38.fa.gz` but before decompression. A concurrent
   `chorus setup` process could decompress+delete the `.gz` between
   those steps, leaving the first process with a FileNotFoundError.
   Fix: check if `fasta_path` already exists before decompressing; use
   `unlink(missing_ok=True)` to tolerate concurrent deletions.

2. `chorus/core/weights_probe.py`: `_probe_chrombpnet` checked for
   `CHORUS_DOWNLOADS_DIR/chrombpnet/DNASE_K562` — the old ENCODE tarball
   path. Since v0.3 the default path (fold=0, chrombpnet_nobias) downloads
   via the HF slim mirror into `~/.cache/huggingface/`, so the probe
   always reported "Not installed" even after a successful setup.
   Fix: switch to `_probe_library_cached` (trust the setup marker),
   matching enformer and borzoi.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants