v27 scorched-earth: track-ID validator P0 + MCP --help + walkthrough#48
Merged
lucapinello merged 1 commit intomainfrom Apr 25, 2026
Merged
Conversation
…anchor
Greenfield install verified end-to-end after deleting all 7 envs +
24.8 GB of caches. README TLDR Steps 1-4 work; quickstart notebook
executes; MCP E2E test (4 min) passes; 6/6 oracles ✓ Healthy.
P0 — track-ID validator rejected FANTOM CAGE identifiers:
`_validate_assay_ids` in Enformer + Borzoi only treated `ENCFF*`
strings as identifier candidates; everything else fell through to
description-substring lookup. FANTOM CAGE IDs like `CNhs11250` are
valid identifiers (resolved by `get_track_by_identifier`) but don't
start with `ENCFF`, so they were classified as descriptions.
`get_tracks_by_description("CNhs11250")` returns empty → guard raised
`InvalidAssayError`. The shipped quickstart notebook uses
`['ENCFF413AHU', 'CNhs11250']` and broke for every new user on
cell In[8].
Fix: try `get_track_by_identifier` first unconditionally; fall back
to description lookup only if identifier lookup returns None. Same
fix in borzoi.py (identical bug, identical code path). Regression
test added in tests/test_prediction_methods.py pinning the FANTOM
CAGE behaviour explicitly.
P1 — `chorus-mcp --help` listed 20 tools, FastMCP registers 22:
Hand-maintained list missed `discover_variant` and
`fine_map_causal_variant`. Reorganised into 4 logical groups
(Discovery / Lifecycle / Predict / Analyze) with explicit "(22)" tag.
P1 — dead anchor in docs/MCP_WALKTHROUGH.md:
`../README.md#mcp-server-ai-assistant-integration` → nothing.
README slug is `#mcp-server`. Updated.
Audit report: audits/2026-04-24_v27_scorched_earth.md.
Tests: 340 passed, 1 skipped on fast suite. Quickstart notebook
executes clean post-fix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
lucapinello
added a commit
that referenced
this pull request
Apr 26, 2026
After audit/2026-04-26-bpnet-cdfs-complete merged into main, replayed
the full README quickstart from a fresh-install state (deleted 7 envs +
~53 GB of caches/downloads/genomes). Verified:
- Default `chorus setup --oracle chrombpnet` is back to the v27 fast
path (K562 + HepG2 DNase only, ~16 min, 3.5 GB) — no longer the
silent 30 GB / 3-hour download.
- `--all-chrombpnet` opt-in flag is properly advertised in
`chorus setup --help`.
- 786-track NPZ auto-downloads from HF (commit c1e5fc1) on chorus
setup --oracle chrombpnet.
- All 6 oracles ✓ Healthy in 67 min total.
- README Step 3 snippet: WT mean 0.468, 3 alts.
- All 3 shipped notebooks execute clean (single_oracle_quickstart,
advanced_multi_oracle_analysis, comprehensive_oracle_showcase).
- 4/4 integration tests pass (MCP E2E, SEI+LegNet CDF download,
ChromBPNet fresh single-model download).
- Fast pytest 340 / 1.
HTML walkthrough render audit (playwright on 18 shipped HTMLs):
- 18/18 loaded with 0 JS errors / 0 page errors
- 0/18 missing the glossary block
- 17/18 with valid IGV browser block (the one without is
batch_sort1_locus_scoring — by design, batch reports show a
multi-variant table, not per-variant tracks)
- All formula badges (log2FC / lnFC / Δ) and percentile columns
present where applicable.
Default disk footprint after install: ~25 GB (matches new README
claim). The 60 GB figure only applies to --all-chrombpnet opt-in.
This is the second consecutive scorched-earth audit (v28 was the
first) to return clean. Six bug-fix PRs (#48, #49, #50, #51, #52,
#53) plus the audit-fix branch all hold up end-to-end.
No findings. Code is shipping-ready.
Audit + screenshots: audits/2026-04-26_v29_scorched_earth/
Co-authored-by: lp698 <lp698@dimm2fv07n65x.partners.org>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fresh-from-zero install audit (v27): deleted all 7 envs + 24.8 GB caches, reinstalled following README TLDR exactly as a new user would. Found one P0 plus two P1 docs items, all fixed in this PR.
P0 — track-ID validator rejected FANTOM CAGE identifiers
The v26 guard (#44) only treated
ENCFF*strings as identifier candidates; everything else fell through to description lookup.CNhs11250(FANTOM CAGE) is a valid identifier but doesn't start withENCFF, so it was classified as a description, didn't match anything, and got rejected.The shipped
single_oracle_quickstart.ipynbuses['ENCFF413AHU', 'CNhs11250']and broke for every new user on cell In[8]:Fix: try
get_track_by_identifierfirst unconditionally; only fall back to description lookup if identifier lookup returns None. Same fix inchorus/oracles/borzoi.py(identical bug, identical code path). Regression test pinned intests/test_prediction_methods.py.P1 —
chorus-mcp --helplisted 20 tools, server registers 22Hand-maintained list missed
discover_variantandfine_map_causal_variant. Reorganised into 4 logical groups with explicit(22)tag so future drift is easy to spot.P1 — dead anchor in
docs/MCP_WALKTHROUGH.md../README.md#mcp-server-ai-assistant-integration→ nothing. README slug is#mcp-server. Updated.Why this matters
Without the scorched-earth run, the v26-introduced P0 in
_validate_assay_idswould have shipped to every new user. Previous "audits" (v22-v26) all kept some warm state (downloads cache, env, genome) so this code path was never exercised on a real first call.Verification
mamba env create -f environment.yml(Step 1)chorus setup --oracle all(Step 2)chorus list/chorus healthsingle_oracle_quickstart.ipynb(post-fix)test_mcp_e2e_list_oracles_and_analyze_variant -m integration)discover_variant_effectsSORT1 rs12740374)Deferred to a follow-up PR
Doc-drift items captured in the audit report but not fixed here:
chorus genome download hg38steps + outdated track-count tablesaudits/AUDIT_CHECKLIST.mdambiguous track-count rows🤖 Generated with Claude Code