Test Multimodal Configurations

This is for when someone (including me) can pick back up work! I am currently testing the multimodal configuration:

- [Config](https://github.com/collaborativebioinformatics/OncoLearn/blob/main/data/configs/modeling/multimodal/tcga_brca_cbioportal_pam50.yaml)
- Command:

```sh
docker compose --profile prod-rocm-wsl run --rm prod-rocm-wsl python -m oncolearn.trainer --config data/configs/modeling/multimodal/tcga_brca_cbioportal_pam5
```

With cross validation using 5 folds each with a train / test set. I have ran a hyper parameter search using optuna:

- For each trial, training for 10 epochs on the train split, then testing on the test split
- Averaging F1 scores of the test splits for each of the 5 trials to get our maximization metric
- Tuning to maximize this F1 score

<img width="1340" height="579" alt="Image" src="https://github.com/user-attachments/assets/e854bc52-3bab-407e-9499-fb2bb1ef0f17" />

I am not sure if this is the best approach, but the goal would then be to have a proper train / val / test split and use the maximized hyper parameters for the final implementation of the model. We can then do the same thing for the cancer stage subtypes.

I have included my current trials, which can be viewed with the VSCode extension "Optuna Dashboard". The code expects the unzipped .db file to be under "outputs" in the base directory:

[brca_cbioportal_pam50.zip](https://github.com/user-attachments/files/26009271/brca_cbioportal_pam50.zip)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Multimodal Configurations #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test Multimodal Configurations #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions