v21 fresh-install audit: data caches purged + re-downloaded, no findings#37
Closed
lucapinello wants to merge 1 commit intochorus-applicationsfrom
Closed
v21 fresh-install audit: data caches purged + re-downloaded, no findings#37lucapinello wants to merge 1 commit intochorus-applicationsfrom
lucapinello wants to merge 1 commit intochorus-applicationsfrom
Conversation
Walked audits/AUDIT_CHECKLIST.md top-to-bottom with every
re-downloadable cache purged before starting:
- ~/.chorus/backgrounds/ (1.5 GB, 6 NPZ files) — nuked
- genomes/hg38.fa + .fai (3.0 GB) — nuked
Then re-downloaded via the documented flows:
- chorus genome download hg38 → 3.1 GB fresh, ~9 min
- get_normalizer(x) for each of 6 oracles → 1.5 GB via HuggingFace
(huggingface.co/datasets/lucapinello/chorus-backgrounds), 44 s total
## Results — no new findings
§1 CLI PASS
§3 device: all 6 envs detect Metal/MPS/JAX-METAL on macOS arm64
§4 CDFs: 6/6 monotonic, p50<=p95<=p99, signed% correct (0/20/0/100/100/13)
§5 API: sequence_length matches spec for all 6; error messages clear
§6 notebook fresh exec: single_oracle_quickstart.ipynb 49 cells,
0 errors, 0 warnings
§7 selenium: 18/18 HTMLs render with fresh Chrome profile,
0 JS errors each
§10 consistency: 0 drift (grep for 5,930 / 7,612 / 196 kbp /
examples/applications/ all empty)
§11 pytest: 335 passed / 1 skipped (8m 28s)
§14.4 chrom validation: fix still holds
§15 offline: 0 runtime CDN fetches in any HTML
§16 logging hygiene: no committed HF tokens or AWS keys
§18 LICENSE + docs/THIRD_PARTY.md + bundled IGV.js header intact
## Scope explicitly deferred to release-host audit
- §1 conda env recreate (80 GB / 2-4 h — destructive to ongoing work)
- §2 HF_TOKEN-missing end-to-end (would break other AlphaGenome work)
- §6 multi-oracle + advanced notebooks (need all 6 oracles loaded)
- §8 MCP E2E over stdio (~4 min AlphaGenome predict)
- §13 real-oracle determinism (~30 min, needs loaded models)
- §14 remaining edge cases (indels, chrM, telomere near-edge)
- §17 pip-audit (already advisory in CI per v20)
## Artefacts in audits/2026-04-21_v21_fresh_install/
- report.md — summary
- 16 selenium screenshots (1600×4500, 18 reports; 2 basename collisions)
- 11 log files: pre-nuke/post-nuke state, CLI help, genome download
trace, CDF fresh pull + sanity, per-env device probe, Python API
probe, consistency greps, selenium output, fresh notebook, pytest
output
Headline: every mechanisable checklist item passes on a purged-cache
run. Previous audits (v15-v20) + fixes in PRs #32, #34, #36 have
driven this repo to a ship-clean state on items that can be tested
from macOS arm64. Release-host items are explicitly scoped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
|
Superseded — artefacts cherry-picked into chorus-applications via 5300daa. The other agent's reconciliation commit imported the 18 selenium screenshots + 10 probe logs + fresh notebook + report.md from this branch, while skipping the 2 file reverts (LegNet 230→200 fix in comprehensive_oracle_showcase.ipynb, test_integration.py chrombpnet skip guard) that this branch would have undone. Full reconciliation note lives at audits/2026-04-21_v21_fresh_install/report.md. Both v21 audits (this macOS arm64 data-cache-purge run + the Linux/CUDA fresh-env run at audits/2026-04-21_v21_fresh_install_linux_cuda.md) are preserved as complementary evidence. Closing to keep the open-PR list clean. No content lost. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Walked
audits/AUDIT_CHECKLIST.mdtop-to-bottom with every re-downloadable cache purged before starting. No new findings — every mechanisable checklist item passes on a fully fresh-data run.What was nuked + re-downloaded
~/.chorus/backgrounds/(6 NPZ CDFs)genomes/hg38.fa+.faichorus genome download hg38→ UCSC + decompress +samtools faidxResults — all pass
chorus --helpsurfaces all 6 subcommandsp50≤p95≤p99, signed% correct (0/20/0/100/100/13)sequence_lengthmatches spec for all 6; errors clearsingle_oracle_quickstart.ipynb: 49 cells, 0 errors, 0 warningsgrep '5,930|7,612|196 kbp|examples/applications/'emptyLICENSE+docs/THIRD_PARTY.md+ bundled IGV.js header intactScope deliberately deferred to release-host audit
Artefacts in
audits/2026-04-21_v21_fresh_install/report.md— summary + verbatim timingsscreenshots/*.png(16 files; 18 reports with 2 basename collisions)logs/00-10_*.txt+09_quickstart_fresh.ipynb— pre-nuke state, post-nuke empty confirmation, CLI help, genome download trace, CDF fresh pull, device probe, API sanity, consistency greps, selenium output, fresh notebook, pytest logHeadline
Previous audits (v15–v20) + fixes in PRs #32, #34, #36 have driven this repo to a ship-clean state on every mechanisable item. What remains open is release-host work (full env build, multi-oracle notebooks, real-oracle determinism, MCP E2E) — items explicitly scoped for that environment.
Test plan
~/.chorus/backgrounds/+genomes/hg38.*deleted, confirmed empty before any check runschorus genome download hg38downloaded, decompressed, indexed in ~9 min (seelogs/03_genome_download.txt)logs/04_cdf_fresh_download.txt)jupyter nbconvert --executeonsingle_oracle_quickstart.ipynb: 0 errors, 0 warnings--user-data-dir=<fresh tmpdir>: 0 JS errors per reportpytestfast suite: 335 passed / 1 skipped🤖 Generated with Claude Code