v27 scorched-earth: track-ID validator P0 + MCP --help + walkthrough by lucapinello · Pull Request #48 · pinellolab/chorus

lucapinello · 2026-04-25T01:12:24Z

Summary

Fresh-from-zero install audit (v27): deleted all 7 envs + 24.8 GB caches, reinstalled following README TLDR exactly as a new user would. Found one P0 plus two P1 docs items, all fixed in this PR.

P0 — track-ID validator rejected FANTOM CAGE identifiers

The v26 guard (#44) only treated ENCFF* strings as identifier candidates; everything else fell through to description lookup. CNhs11250 (FANTOM CAGE) is a valid identifier but doesn't start with ENCFF, so it was classified as a description, didn't match anything, and got rejected.

The shipped single_oracle_quickstart.ipynb uses ['ENCFF413AHU', 'CNhs11250'] and broke for every new user on cell In[8]:

InvalidAssayError: Enformer does not recognise these track IDs: ['CNhs11250']

Fix: try get_track_by_identifier first unconditionally; only fall back to description lookup if identifier lookup returns None. Same fix in chorus/oracles/borzoi.py (identical bug, identical code path). Regression test pinned in tests/test_prediction_methods.py.

P1 — `chorus-mcp --help` listed 20 tools, server registers 22

Hand-maintained list missed discover_variant and fine_map_causal_variant. Reorganised into 4 logical groups with explicit (22) tag so future drift is easy to spot.

P1 — dead anchor in `docs/MCP_WALKTHROUGH.md`

../README.md#mcp-server-ai-assistant-integration → nothing. README slug is #mcp-server. Updated.

Why this matters

Without the scorched-earth run, the v26-introduced P0 in _validate_assay_ids would have shipped to every new user. Previous "audits" (v22-v26) all kept some warm state (downloads cache, env, genome) so this code path was never exercised on a real first call.

Verification

Check	Result
Delete 7 envs + 24.8 GB caches	clean
`mamba env create -f environment.yml` (Step 1)	5 min, clean
`chorus setup --oracle all` (Step 2)	60 min, 6/6 ✓
README Step 3 Python snippet	WT mean 0.468, 3 alts scored
`chorus list` / `chorus health`	6/6 ✓ Healthy, exit 0
`single_oracle_quickstart.ipynb` (post-fix)	exit 0, all cells
MCP E2E (`test_mcp_e2e_list_oracles_and_analyze_variant -m integration`)	1 passed, 280s
Multi-oracle smoke (Enformer + ChromBPNet)	both finite
Discovery smoke (`discover_variant_effects` SORT1 rs12740374)	OK
Fast pytest	340 passed, 1 skipped

Deferred to a follow-up PR

Doc-drift items captured in the audit report but not fixed here:

3 notebook md cells with redundant chorus genome download hg38 steps + outdated track-count tables
audits/AUDIT_CHECKLIST.md ambiguous track-count rows

🤖 Generated with Claude Code

…anchor Greenfield install verified end-to-end after deleting all 7 envs + 24.8 GB of caches. README TLDR Steps 1-4 work; quickstart notebook executes; MCP E2E test (4 min) passes; 6/6 oracles ✓ Healthy. P0 — track-ID validator rejected FANTOM CAGE identifiers: `_validate_assay_ids` in Enformer + Borzoi only treated `ENCFF*` strings as identifier candidates; everything else fell through to description-substring lookup. FANTOM CAGE IDs like `CNhs11250` are valid identifiers (resolved by `get_track_by_identifier`) but don't start with `ENCFF`, so they were classified as descriptions. `get_tracks_by_description("CNhs11250")` returns empty → guard raised `InvalidAssayError`. The shipped quickstart notebook uses `['ENCFF413AHU', 'CNhs11250']` and broke for every new user on cell In[8]. Fix: try `get_track_by_identifier` first unconditionally; fall back to description lookup only if identifier lookup returns None. Same fix in borzoi.py (identical bug, identical code path). Regression test added in tests/test_prediction_methods.py pinning the FANTOM CAGE behaviour explicitly. P1 — `chorus-mcp --help` listed 20 tools, FastMCP registers 22: Hand-maintained list missed `discover_variant` and `fine_map_causal_variant`. Reorganised into 4 logical groups (Discovery / Lifecycle / Predict / Analyze) with explicit "(22)" tag. P1 — dead anchor in docs/MCP_WALKTHROUGH.md: `../README.md#mcp-server-ai-assistant-integration` → nothing. README slug is `#mcp-server`. Updated. Audit report: audits/2026-04-24_v27_scorched_earth.md. Tests: 340 passed, 1 skipped on fast suite. Quickstart notebook executes clean post-fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After audit/2026-04-26-bpnet-cdfs-complete merged into main, replayed the full README quickstart from a fresh-install state (deleted 7 envs + ~53 GB of caches/downloads/genomes). Verified: - Default `chorus setup --oracle chrombpnet` is back to the v27 fast path (K562 + HepG2 DNase only, ~16 min, 3.5 GB) — no longer the silent 30 GB / 3-hour download. - `--all-chrombpnet` opt-in flag is properly advertised in `chorus setup --help`. - 786-track NPZ auto-downloads from HF (commit c1e5fc1) on chorus setup --oracle chrombpnet. - All 6 oracles ✓ Healthy in 67 min total. - README Step 3 snippet: WT mean 0.468, 3 alts. - All 3 shipped notebooks execute clean (single_oracle_quickstart, advanced_multi_oracle_analysis, comprehensive_oracle_showcase). - 4/4 integration tests pass (MCP E2E, SEI+LegNet CDF download, ChromBPNet fresh single-model download). - Fast pytest 340 / 1. HTML walkthrough render audit (playwright on 18 shipped HTMLs): - 18/18 loaded with 0 JS errors / 0 page errors - 0/18 missing the glossary block - 17/18 with valid IGV browser block (the one without is batch_sort1_locus_scoring — by design, batch reports show a multi-variant table, not per-variant tracks) - All formula badges (log2FC / lnFC / Δ) and percentile columns present where applicable. Default disk footprint after install: ~25 GB (matches new README claim). The 60 GB figure only applies to --all-chrombpnet opt-in. This is the second consecutive scorched-earth audit (v28 was the first) to return clean. Six bug-fix PRs (#48, #49, #50, #51, #52, #53) plus the audit-fix branch all hold up end-to-end. No findings. Code is shipping-ready. Audit + screenshots: audits/2026-04-26_v29_scorched_earth/ Co-authored-by: lp698 <lp698@dimm2fv07n65x.partners.org> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

lucapinello merged commit 2ecd998 into main Apr 25, 2026
1 check passed

lucapinello deleted the fix/2026-04-24-v27-track-id-validator branch April 25, 2026 01:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v27 scorched-earth: track-ID validator P0 + MCP --help + walkthrough#48

v27 scorched-earth: track-ID validator P0 + MCP --help + walkthrough#48
lucapinello merged 1 commit intomainfrom
fix/2026-04-24-v27-track-id-validator

lucapinello commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lucapinello commented Apr 25, 2026

Summary

P0 — track-ID validator rejected FANTOM CAGE identifiers

P1 — chorus-mcp --help listed 20 tools, server registers 22

P1 — dead anchor in docs/MCP_WALKTHROUGH.md

Why this matters

Verification

Deferred to a follow-up PR

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

P1 — `chorus-mcp --help` listed 20 tools, server registers 22

P1 — dead anchor in `docs/MCP_WALKTHROUGH.md`