This page explains the public acceptance rule for promoted repairs.
A local repair is not accepted only because it wins on one slice. It must survive a coupled promotion contract.
Publicly visible parts of that contract include:
- replay against the frozen authoritative parent
- external non-regression
- leakage and deep-pretrain audit status
- runtime and shadow cleanliness
- overfit and release-boundary disclosure
Current public reference:
- official
IFEval = 0.780037 - official
LogiQA = 0.392523 external_dev = 0.308908external_blind = 0.425072runtime_pass = trueshadow_pass = true
Sources:
- ../results/research_line/current_host_surface_reference_status.json
- ../results/research_line/current_host_surface_guard_bundle_status.json
All public gains are read against the same authoritative parent boundary.
The formal mainline behind the current surface is not a single-benchmark contract. It requires multiple benchmark routes, including:
LogiQAGSM8K- a narrow instruction-following benchmark
This matters because the accepted host surface is not supposed to be interpreted as a one-benchmark trick.
The formal mainline treats external dev and external blind as required multi-route checks, not optional receipts.
The public bundle exposes the resulting status, including:
- leakage pass
- deep-pretrain audit pass
- shadow pass
- structured execution pass
- research runtime ready
Even after a surface is published, the repo keeps boundary-setting evidence visible:
- clean
holdout2 = 0.400000 blind_remainder = 0.448133- additional raw release-boundary audits are available under ../results/audit
The public guard bundle is intentionally conservative:
bundle_pass = truepromotion_recommended = false
This is not a contradiction. It means the public wrappers reproduce the accepted research surface, but the repo is still positioned as a research line rather than a commercial freeze replacement.
See:
This repo does not expose:
- full gate history for rejected candidates
- all threshold tuning history
- full protected-win bookkeeping
- the private candidate-generation workflow
The goal is to publish the acceptance contract and its evidence without giving away the internal search and training pipeline.