|
| 1 | +# 2026-04-23 — Setup prefetch + health classification + token flow |
| 2 | + |
| 3 | +Author: Claude (session driven by Luca) |
| 4 | +Scope: the consolidated change landing |
| 5 | +`chorus/core/weights_probe.py`, `chorus/cli/_setup_prefetch.py`, |
| 6 | +`chorus/cli/_setup_all.py`, `chorus/cli/_tokens.py`, plus |
| 7 | +modifications to `chorus/cli/main.py`, `chorus/core/environment/runner.py`, |
| 8 | +`chorus/__init__.py`, `chorus/utils/http.py`, `chorus/utils/ld.py`, |
| 9 | +`chorus/oracles/legnet.py`, `chorus/oracles/sei.py`, and `README.md`. |
| 10 | + |
| 11 | +## What was run |
| 12 | + |
| 13 | +Sections of [`AUDIT_CHECKLIST.md`](AUDIT_CHECKLIST.md) that could be |
| 14 | +affected by the change were executed in full; sections that are |
| 15 | +orthogonal (§3 GPU, §4 CDF math, §6 notebooks, §7 HTML reports, |
| 16 | +§8 MCP, §12/§13 reproducibility, §14 genomics edges, §15 offline, |
| 17 | +§17 supply chain, §18 license) were **not** re-run — the change |
| 18 | +doesn't touch those code paths. |
| 19 | + |
| 20 | +## Summary |
| 21 | + |
| 22 | +- 336 passed, 4 deselected (integration), 0 failed in the fast suite. |
| 23 | + `mamba run -n chorus python -m pytest tests/ --ignore=tests/test_smoke_predict.py -m "not integration" -q` → **pass**. |
| 24 | +- `chorus health` on a machine with no setup markers: 6 oracles, 7.2 s |
| 25 | + total, each clearly reports "Not installed — run `chorus setup <oracle>`" |
| 26 | + with the exact missing artifacts. Previously Sei alone hung the |
| 27 | + 120 s subprocess timeout. |
| 28 | +- Health → Healthy transition verified end-to-end with a fabricated |
| 29 | + complete state (marker + artifacts). |
| 30 | +- `chorus setup --oracle all` without an HF token + non-TTY stdin halts |
| 31 | + with rc=1 **before any env build** and emits the three token hints |
| 32 | + (`HF_TOKEN`, `--hf-token`, `huggingface-cli login`). |
| 33 | +- `create_oracle('fakeOracle')` still raises `ValueError` naming the six |
| 34 | + valid options — the `kwargs.setdefault("use_environment", False)` change |
| 35 | + in `chorus/__init__.py` is a no-op for unknown oracles (the check |
| 36 | + runs first). |
| 37 | +- Interactive `HF_TOKEN` prompt uses `getpass` (hidden); `LDLINK_TOKEN` |
| 38 | + prompt was switched from `input()` to `getpass` during the audit. |
| 39 | + |
| 40 | +## Per-section findings |
| 41 | + |
| 42 | +### §1 Installation & environment |
| 43 | +- [x] **§1.3** `chorus --help` and every subcommand's `--help` render |
| 44 | + non-empty (setup: 20 lines, health: 8, genome: 12, etc). |
| 45 | +- [x] **§1.6** Idempotency: `chorus setup --oracle enformer --no-weights |
| 46 | + --no-backgrounds --no-genome` on an already-present env returns |
| 47 | + exit 0 and does not rebuild. |
| 48 | +- [x] **§1.9** `~/.chorus/backgrounds/` auto-download: verified by |
| 49 | + running `chorus setup --oracle enformer --no-weights --no-genome` |
| 50 | + which pulled `enformer_pertrack.npz` from HF in 21 s and wrote |
| 51 | + it to the canonical cache. |
| 52 | +- [x] **New** Setup marker convention added: `downloads/<oracle>/.chorus_setup_v1` |
| 53 | + is the proof-of-install signal read by `chorus health` and |
| 54 | + written by `chorus setup` on success. Documented in |
| 55 | + `chorus/core/weights_probe.py` docstring. |
| 56 | +- [x] **New P2, fixed during audit** `--force` now invalidates the |
| 57 | + stale marker up front so a mid-rebuild failure doesn't leave |
| 58 | + the oracle reporting Healthy (see `chorus/cli/main.py` and |
| 59 | + `chorus/cli/_setup_all.py`). |
| 60 | + |
| 61 | +### §2 HuggingFace authentication |
| 62 | +- [x] **§2.1** `HF_TOKEN` env path: verified — `whoami()` succeeds and |
| 63 | + we log the user name without exposing the token. |
| 64 | +- [x] **§2.3** No-token, no-login path raises a single clear message |
| 65 | + that names `HF_TOKEN`, the exact gated repo URL |
| 66 | + (`huggingface.co/google/alphagenome-all-folds`), and the |
| 67 | + `huggingface-cli login` alternative. All three hints present in |
| 68 | + the AlphaGenome error and in the new `chorus setup` halt message. |
| 69 | +- [x] **§2.4** Repo URL consistency: the string |
| 70 | + `huggingface.co/google/alphagenome-all-folds` appears in |
| 71 | + `chorus/oracles/alphagenome.py`, `chorus/oracles/alphagenome_source/templates/load_template.py`, |
| 72 | + `README.md` (three places including the new Tokens section), and |
| 73 | + the new `chorus/cli/_tokens.py`. No drift. |
| 74 | + |
| 75 | +### §5 Python API sanity |
| 76 | +- [x] **§5.1** `create_oracle('<name>', use_environment=False)` works |
| 77 | + for all 6 oracles (verified for legnet under `chorus-legnet` env). |
| 78 | + Invalid name raises `ValueError: Unknown oracle: fakeoracle. |
| 79 | + Available: ['enformer', 'borzoi', 'chrombpnet', 'sei', 'legnet', |
| 80 | + 'alphagenome']`. |
| 81 | +- [x] **New behaviour** `use_environment=False` now correctly |
| 82 | + propagates into the oracle instance via |
| 83 | + `kwargs.setdefault("use_environment", False)`. Previously the |
| 84 | + oracle would default to `use_environment=True` inside the |
| 85 | + "direct" branch, which made the `chorus setup` prefetch script |
| 86 | + re-spawn a subprocess back into the env it was already running |
| 87 | + in. Covered by a bespoke smoke test during the audit. |
| 88 | +- [x] **§5.4** `predict_variant_effect` 1-based coordinate regression |
| 89 | + (`tests/test_prediction_methods.py::test_variant_position_is_1_based`) |
| 90 | + still passes (13/13 in test_prediction_methods.py). |
| 91 | + |
| 92 | +### §9 Error messages |
| 93 | +- [x] `create_oracle('fakeOracle')` names the six valid options. |
| 94 | +- [x] AlphaGenome HF token missing → message contains `HF_TOKEN`, |
| 95 | + gated repo URL, and `huggingface-cli login`. |
| 96 | +- [x] Network drop during `download_pertrack_backgrounds` returns 0 |
| 97 | + and logs a warning (`tests/test_error_recovery.py::TestDownloadFailurePaths` |
| 98 | + 2/2 pass). |
| 99 | +- [x] `chorus setup --oracle all` halt message names `HF_TOKEN`, |
| 100 | + `--hf-token`, and `huggingface-cli login` — all three paths. |
| 101 | +- [x] `test_missing_oracle_env_falls_back_gracefully` still passes. |
| 102 | +- [x] `test_download_with_resume_leaves_partial_and_resumes_on_second_call` |
| 103 | + still passes after the tqdm integration. |
| 104 | + |
| 105 | +### §10 Consistency of claims across the repo |
| 106 | +- [x] Drift grep |
| 107 | + (`grep -rn '5,930\|5930\|196 kbp\|examples/applications/' --include='*.md' --include='*.py' .` |
| 108 | + excluding `audits/`) returns nothing. |
| 109 | +- [x] No TODO/FIXME/WIP markers in any of the 12 changed/new files. |
| 110 | +- [x] README "Tokens" section (new) names both tokens consistently |
| 111 | + with `chorus/cli/_tokens.py` resolution order. LDlink |
| 112 | + Troubleshooting block pre-existed and is now backed by the |
| 113 | + `LDLINK_TOKEN` env var + `~/.chorus/config.toml` fallback added |
| 114 | + to `chorus/utils/ld.py`. |
| 115 | + |
| 116 | +### §11 Test suite |
| 117 | +- [x] **Fast suite** `mamba run -n chorus python -m pytest tests/ |
| 118 | + --ignore=tests/test_smoke_predict.py -m "not integration" -q` |
| 119 | + → 336 passed, 4 deselected, 0 failed, 72.7 s (threshold ≥334). |
| 120 | +- [N/A] Integration suite not run (no release-host access in this |
| 121 | + session). Flagged for the release-host auditor. |
| 122 | + |
| 123 | +### §16 Logging hygiene |
| 124 | +- [x] `grep -rn 'hf_[a-zA-Z0-9]\{20,\}' chorus/ examples/ docs/ audits/` |
| 125 | + returns nothing — no real tokens committed. |
| 126 | +- [x] All logger.info / logger.error calls in `chorus/cli/_tokens.py` |
| 127 | + log token **metadata** (source path, `whoami()` user name, |
| 128 | + success/rejection) but never the token value itself. |
| 129 | +- [x] Interactive prompts use `getpass` (hidden stdin) for both HF and |
| 130 | + LDlink after a polish during the audit. |
| 131 | + |
| 132 | +## Things deferred (not blocking this change) |
| 133 | +- P1 §1.4 Running `chorus setup --oracle <X>` end-to-end on a fresh |
| 134 | + Linux/CUDA host and on macOS-arm64 — requires release hosts we don't |
| 135 | + have in this session. |
| 136 | +- P1 §11 Integration-marked suite — same. |
| 137 | +- P1 §8 MCP E2E for rs12740374 — no changes to MCP code; skipped. |
| 138 | + |
| 139 | +## Files touched |
| 140 | +``` |
| 141 | + R chorus/__init__.py (+5 lines) |
| 142 | + R chorus/cli/main.py (+141 / -lines reorganized) |
| 143 | + R chorus/core/environment/runner.py (+19 lines, probe wire-up) |
| 144 | + R chorus/oracles/legnet.py (urlretrieve → download_with_resume) |
| 145 | + R chorus/oracles/sei.py (lazy download + packaged-metadata fallback) |
| 146 | + R chorus/utils/http.py (tqdm integration) |
| 147 | + R chorus/utils/ld.py (LDLINK_TOKEN env + config fallback) |
| 148 | + R README.md (+Tokens section) |
| 149 | + + chorus/cli/_setup_all.py (93 lines) |
| 150 | + + chorus/cli/_setup_prefetch.py (173 lines) |
| 151 | + + chorus/cli/_tokens.py (205 lines) |
| 152 | + + chorus/core/weights_probe.py (140 lines) |
| 153 | +``` |
| 154 | + |
| 155 | +## Verdict |
| 156 | +**Green.** Safe to commit the 8 modifications + 4 new files, but |
| 157 | +**do not `git add .` or `git add -A`** — `Untitled.ipynb` is a |
| 158 | +pre-existing untracked stray (Apr 22, not part of this change) and |
| 159 | +must be left out. Recommend a selective `git add` of the 12 files |
| 160 | +listed above. |
0 commit comments