fix/feat: CHIP normalization, setup timeout, cleanup command, genome race fix#78
Open
lucapinello wants to merge 4 commits intomainfrom
Open
fix/feat: CHIP normalization, setup timeout, cleanup command, genome race fix#78lucapinello wants to merge 4 commits intomainfrom
lucapinello wants to merge 4 commits intomainfrom
Conversation
…F alias ChromBPNet CHIP predictions emit track IDs like `CHIP:K562:REST:+`/`:-` but the background CDF stores `CHIP:K562:REST` (no strand). All CHIP normalization lookups silently returned None, falling back to raw unscaled values in IGV reports and percentile output. Fix: strip `:+`/`:-` suffix as a fallback in `_lookup`, `_lookup_batch`, and `perbin_floor_rescale_batch` in `PerTrackNormalizer`. Both strand predictions correctly share the single background distribution row. Also alias `alphagenome_pt` → `alphagenome` in `_ensure_loaded` and `get_pertrack_normalizer`: since both backends produce identical predictions (same model + weights), no separate CDF file is needed. The alias is bypassed automatically if a dedicated `alphagenome_pt_pertrack.npz` appears in the cache later. Adds 8 unit tests covering strand-suffixed lookups and the strandless key fallthrough. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`chorus setup --oracle X` previously had no timeout for either the `mamba env create` subprocess or the weight download phase. On slow or unstable connections (e.g. remote lab servers) a stalled download would hang indefinitely. Changes: - `chorus/cli/main.py`: add `--setup-timeout SECONDS` flag (default: unlimited). Passes through to both `create_environment()` and `prefetch_for_oracle()`. - `chorus/core/environment/manager.py`: `create_environment()` gains a `timeout` parameter. A `threading.Timer` kills the `mamba env create` subprocess after N seconds and raises a descriptive `RuntimeError`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Uninstalling chorus previously required manually removing 7 conda envs,
downloaded weights, background CDFs, and reference genomes. `chorus cleanup`
handles this in one command.
Usage:
chorus cleanup --oracle {name|all} # env + weights
chorus cleanup --backgrounds # ~/.chorus/backgrounds/*.npz
chorus cleanup --genomes # downloaded reference genomes
chorus cleanup --all # everything above
chorus cleanup --all --dry-run # preview without deleting
- Missing paths silently skipped (idempotent)
- Dry-run prints [DRY RUN] prefix on every action
- Summary line at end: "Removed N environment(s), M weight dir(s), K file(s)"
- README: Upgrading section updated to use `chorus cleanup --all`;
new Uninstalling subsection added; `--setup-timeout` usage note added
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…robe Two bugs found during scorched-earth teardown/reinstall test: 1. `chorus/utils/genome.py`: `download_with_resume` releases its lock after writing `hg38.fa.gz` but before decompression. A concurrent `chorus setup` process could decompress+delete the `.gz` between those steps, leaving the first process with a FileNotFoundError. Fix: check if `fasta_path` already exists before decompressing; use `unlink(missing_ok=True)` to tolerate concurrent deletions. 2. `chorus/core/weights_probe.py`: `_probe_chrombpnet` checked for `CHORUS_DOWNLOADS_DIR/chrombpnet/DNASE_K562` — the old ENCODE tarball path. Since v0.3 the default path (fold=0, chrombpnet_nobias) downloads via the HF slim mirror into `~/.cache/huggingface/`, so the probe always reported "Not installed" even after a successful setup. Fix: switch to `_probe_library_cached` (trust the setup marker), matching enformer and borzoi. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dmitrypenzar1996
approved these changes
May 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Four items from post-v0.4.0 user feedback, found/validated by a full scorched-earth teardown+reinstall cycle (cleanup → rebuild all 7 oracles → 375 tests pass).
CHIP:K562:REST:+/:-but the background CDF storesCHIP:K562:REST. All CHIP normalization lookups silently fell back to raw unscaled values in IGV reports and percentile output. Fixed inPerTrackNormalizerwith a strand-strip fallback in_lookup,_lookup_batch, andperbin_floor_rescale_batch. 8 new unit tests.alphagenome_ptCDF alias — both backends produce identical predictions soalphagenome_ptnow transparently reuses thealphagenomeCDF. No separate NPZ upload needed; bypassed automatically if a dedicated file appears later.chorus setup --setup-timeout SECONDS— caps both themamba env createsubprocess and the weight download phase. Addresses slow/unstable connections on remote lab servers. Default: unlimited (no behaviour change).chorus cleanupcommand — removes conda envs, downloaded weights, background CDFs, and genomes in one command. Supports--oracle {name|all},--backgrounds,--genomes,--all,--dry-run.chorus setup --oracle all(or two parallel setups) could hit a race where one process decompressed and deletedhg38.fa.gzwhile another was about to open it →FileNotFoundError. Fixed with existence checks andmissing_ok=True._probe_chrombpnetchecked forCHORUS_DOWNLOADS_DIR/chrombpnet/DNASE_K562(old ENCODE tarball path). Since v0.3 the default path uses the HF slim mirror into~/.cache/huggingface/, sochorus healthalways reported chrombpnet as "Not installed" after a clean setup. Switched to_probe_library_cached.chorus cleanup --all; new Uninstalling subsection;--setup-timeoutusage note.Test plan
pytest tests/test_normalization_chip_strand.py -v— 7 passedpytest tests/ -m "not integration" -q— 375 passed, 1 skipped (from clean install)chorus cleanup --all --dry-run— correct output, no deletionschorus cleanup --all— removed all 7 envs, weights, backgrounds, genomechorus setup --oracle all— rebuilt 7/7 oracles, all Healthychorus health— 7/7 ✓ Healthy including chrombpnetthreading.Timerfires at exactly N seconds, raises descriptiveRuntimeError🤖 Generated with Claude Code