Multi-oracle report: add missing noise-floor/p99 scale block (6ed117f gap)#39
Merged
lucapinello merged 1 commit intochorus-applicationsfrom Apr 23, 2026
Conversation
…gap) Commit 6ed117f ("IGV normalization explanation: causal report + LegNet stub parity") fixed the causal report and LegNet stub but missed the multi-oracle report, which had its own terser IGV intro. Re-audit of the 4-part IGV contract across all 17 shipped HTMLs found rs12740374_SORT1_multioracle_report.html was missing 4/5 of the scale- explanation markers the other reports ship ("noise floor", "p95", "peak threshold", "rescaled using") — only "p99" appeared, in a different phrasing. ## What users saw Before: the multi-oracle IGV block said only "Signals are floor-rescaled to [0, 3.0] where 1.0 is the genome-wide p99 peak for that assay." No mention of the p95 noise floor or "top 1% of bins" framing that every single-oracle + causal report uses, so users opening the multi-oracle view in isolation got a less-explicit explanation than in the per-oracle reports. ## Fix chorus/analysis/multi_oracle_report.py:712-724 — split the existing single-paragraph intro into two paragraphs: the window/coverage caveat stays, plus the same "noise floor (p95) and peak threshold (p99)" block the single-oracle and causal reports emit. Future regenerations of the multi-oracle HTML will ship the new block automatically. Also hand-patched the shipped examples/walkthroughs/validation/SORT1_rs12740374_multioracle/rs12740374_SORT1_multioracle_report.html to insert the same paragraph at the matching location, since regenerating via scripts/regenerate_multioracle.py --consolidate would also rebuild the IGV section from scratch — but without the prediction pickles (not committed), that rebuild drops the IGV entirely. The hand-patch preserves the existing IGV tracks while adding the new explanation. ## 4-part audit result All 17 IGV-containing HTMLs now pass: - (a) IGV embedded — 17/17 - (b) ymax=3.0 signal tracks / 1.0 summary tracks — 17/17 consistent - (c) scale explanation (5/5 markers) — 17/17 (was 16/17) - (d) assay:cell_type track provenance — 17/17 (bare "SPLICE_SITES" labels on AlphaGenome donor/acceptor/padding tracks are correctly cell-type-agnostic per AG metadata) ## Known follow-up scripts/regenerate_multioracle.py --consolidate from committed JSONs silently drops the IGV section because the per-oracle prediction pickles aren't persisted. Either persist them or mark IGV-rebuild as requiring --oracle runs. Filed as a note; not blocking this fix. Tests: 339 passed / 1 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What I found
Commit
6ed117f("IGV normalization explanation: causal report + LegNet stub parity") fixed the causal report and the LegNet stub but missed the multi-oracle report, which had its own terser intro.Independent re-audit of the 4-part IGV contract (IGV embedded / ymax=3.0 signal tracks / scale-explanation block /
assay:cell_typeprovenance) across all 17 shipped IGV-containing HTMLs flaggedrs12740374_SORT1_multioracle_report.htmlat 1/5 scale-explanation markers — every other report had 5/5.What users saw
Before this fix, the multi-oracle IGV block read only:
No mention of the p95 noise floor or the "top 1% of bins" framing the other reports ship. Users opening the multi-oracle view in isolation got a less-explicit explanation than in the per-oracle reports.
Fix
chorus/analysis/multi_oracle_report.py:712-724— split the one-paragraph intro into two: the window/coverage caveat stays, plus the same "noise floor (p95) and peak threshold (p99)" block the single-oracle + causal reports emit.rs12740374_SORT1_multioracle_report.html— hand-patched to insert the same paragraph at the matching location, preserving the existing IGV tracks.scripts/regenerate_multioracle.py --consolidatefrom committed JSONs silently drops the IGV section (per-oracle prediction pickles aren't persisted), so a naive regen would make things worse — hand-patch was the right call.4-part audit result (after fix)
All 17 IGV-containing HTMLs now pass:
assay:cell_typetrack provenanceSPLICE_SITESlabels are correctly cell-type-agnostic per AG metadata — investigated and confirmed not drift)Flagged follow-up (not blocking this fix)
scripts/regenerate_multioracle.py --consolidatefrom committed JSONs silently drops the IGV section because per-oracle prediction pickles aren't persisted. Either persist the pickles or mark IGV-rebuild as requiring the--oracle <name>runs first. Worth a separate PR; doesn't affect shipped reports today because they're regenerated with the full pipeline.Test plan
pytest tests/ --ignore=tests/test_smoke_predict.py -q→ 339 passed / 1 skippedoracle · assay:cell_typeorassay:cell_typepatterns confirmed,SPLICE_SITEScell-type-agnostic tracks verified against AG metadata (739 SPLICE_SITES tracks total; 5 have empty cell_type by design: donor/acceptor +/− strand + padding)🤖 Generated with Claude Code