Commit f59fd9c
feat(data): expand platinum to 176 drugs via OpenFDA extraction
OpenFDA API mining: 5,000 labels → 177 extractions → 29 new validated drugs
Quality filters: dose ≥ 1mg, Cmax/dose ∈ [1e-6, 0.5], SMILES available
Holdout expanded: 71 → 100 drugs (29 truly unseen OpenFDA drugs)
Removed 7 bad extractions (dose=0, impossible Cmax/dose, known prodrugs)
Honest performance on expanded holdout:
Tier 1 ALL (99 drugs): AAFE 2.903 [2.34, 3.67]
Tier 2 Mechanistic AD (90 drugs): AAFE 2.563 [2.11, 3.18]
The expanded set gives MORE HONEST generalization estimate:
- Previous Tier 2 (62 drugs): AAFE 2.329 (curated, selection bias)
- Expanded Tier 2 (90 drugs): AAFE 2.563 (includes automated extraction)
- Delta +0.234 reflects genuinely harder unseen drugs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent bd16f9e commit f59fd9c
5 files changed
Lines changed: 1731 additions & 4 deletions
File tree
- data
- clinical
- external
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
148 | 148 | | |
149 | 149 | | |
150 | 150 | | |
151 | | - | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
152 | 181 | | |
153 | 182 | | |
154 | 183 | | |
155 | | - | |
| 184 | + | |
156 | 185 | | |
157 | 186 | | |
158 | 187 | | |
| |||
0 commit comments