A few non-implementation directories in this project (predefined in common.py) should be noticed:
├── ckps/
│ └── {data_type}/
│ └── {criteria}/
| └── {ckp_id}/
│ ├── best.pt
| ├── final.pt
| ├── summary.txt
| └── setting.json
├── data/
| └── {data_type}/
│ ├── train/
| │ └── {tree_id}/
| │ ├── data.phy
| │ ├── gt.newick
| │ ├── gt.newick.log
| │ └── start.newick
| └── test/
├── LOCAL/
├── logs/
└── results/
| └── {data_type}/
│ └── {criteria}/
| └── {ckp_id}/
| └── {search_strategy}/
| ├── {tree_id}_best.png
| └── {tree_id}.png
├── configs/
| ├── default_config.yaml
| ├── config0.yaml
| ├── config1.yaml
| └── .....
├──
ckps: where the model is saved,ckps_idis a timestamp (when model started being trained)logs: logs when infer and train to look back laterresults: visualisation of the history of infering a tree, compared to IQTree and RAxMLdata: has the structure as abovesetting.jsonindata_type/criteria/ckps_id: is the replicate of the config file inconfigsfolder, but with additional information inexperimentfield- store basic info about phylo info:
data_type(dna or protein),criteria(parsimony or likelihood) chosen_feature(normal or feat_likelihood or feat_parsimony)chosen_network(qlearning or reinforce)chosen_reward(raw, relu or normalized). This must be consistent with the reward type used during inference time.reset_per_epoch(true or false).
- store basic info about phylo info:
A few files for configurations that you should take notice:
build.sh: to compile C++ code for computing SPR movesconfig.yaml: default hyperparameters of the projectbmeprl.yml: to create conda environmentDockerfile: to create docker image To set up conda environment (if you're already on a server with GPU)
conda env create -f bmeprl.yml
conda activate bmeprl
To set up docker image (if you need to setup on VastAI)
# On your local
docker build -t {user_name}/bmeprl:cu121-v1 .
docker push {user_name}/bmeprl:cu121-v1
# On rented server
source /opt/conda/etc/profile.d/conda.sh
conda activate bmeprl
I used Alisim from IQTree to simulate data. Some configurations to be noticed: refer to config_guide.md
python generate.py
Similarly to generate data: refer to config_guide.md to modify your configurations
python main.py
Similarly to generate data: refer to config_guide.md to modify your configurations
python infer.py
I took reference from 3 sources:
python generate.py --data-type dna --split train --n-trees 30
python generate.py --data-type dna --split test --n-trees 6
python generate.py --data-type protein --split train --n-trees 30
python generate.py --data-type protein --split test --n-trees 6
| Data Type | Criteria | Feature | CMD build C++ |
|---|---|---|---|
| dna | likelihood | normal | bash build.sh "" "" likelihood |
| dna | parsimony | normal | bash build.sh "" "" parsimony |
| dna | likelihood | likelihood | bash build.sh FEAT_LIKELIHOOD "" likelihood |
| dna | parsimony | parsimony | bash build.sh FEAT_PARSIMONY "" parsimony |
| protein | likelihood | normal | bash build.sh "" FEAT_PROTEIN likelihood |
| protein | parsimony | normal | bash build.sh "" FEAT_PROTEIN parsimony |
| protein | likelihood | likelihood | bash build.sh FEAT_LIKELIHOOD FEAT_PROTEIN likelihood |
| protein | parsimony | parsimony | bash build.sh FEAT_PARSIMONY FEAT_PROTEIN parsimony |
| Data Type | Criteria | Feature | RL Algo | CMD train |
|---|---|---|---|---|
| dna | likelihood | normal | reinforce | python main.py --network-type reinforce --criteria likelihood --data-type dna |
| dna | likelihood | normal | qlearning | python main.py --network-type q_learning --criteria likelihood --data-type dna |
| dna | parsimony | normal | reinforce | python main.py --network-type reinforce --criteria parsimony --data-type dna |
| dna | parsimony | normal | qlearning | python main.py --network-type q_learning --criteria parsimony --data-type dna |
| dna | likelihood | likelihood | reinforce | python main.py --network-type reinforce --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD |
| dna | likelihood | likelihood | qlearning | python main.py --network-type q_learning --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD |
| dna | parsimony | parsimony | reinforce | python main.py --network-type reinforce --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY |
| dna | parsimony | parsimony | qlearning | python main.py --network-type q_learning --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY |
| protein | likelihood | normal | reinforce | python main.py --network-type reinforce --criteria likelihood --data-type protein |
| protein | likelihood | normal | qlearning | python main.py --network-type q_learning --criteria likelihood --data-type protein |
| protein | parsimony | normal | reinforce | python main.py --network-type reinforce --criteria parsimony --data-type protein |
| protein | parsimony | normal | qlearning | python main.py --network-type q_learning --criteria parsimony --data-type protein |
| protein | likelihood | likelihood | reinforce | python main.py --network-type reinforce --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD |
| protein | likelihood | likelihood | qlearning | python main.py --network-type q_learning --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD |
| protein | parsimony | parsimony | reinforce | python main.py --network-type reinforce --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY |
| protein | parsimony | parsimony | qlearning | python main.py --network-type q_learning --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY |
Example: run inference for 2 test trees (tree_id=0 and 95) using DNA data
python infer.py \
--ckp-dir /root/final_thesis/ckps/dna/likelihood/reinforce/260420_194333 \
--data-dir /root/final_thesis/data/dna/test \
--tree-id 0 95 \
--search monotonic
Using protein data:
python infer.py \
--ckp-dir /root/final_thesis/ckps/protein/likelihood/reinforce/260420_194333 \
--data-dir /root/final_thesis/data/protein/test \
--tree-id 0 95 \
--search monotonic
Nếu muốn chạy cho tất cả cây trong thư mục test, chỉ cần bỏ --tree-id, script sẽ tự động lấy toàn bộ tree_id có trong thư mục.
| Data Type | Feature | Search | RL Algo | Criteria | CMD infer |
|---|---|---|---|---|---|
| dna | normal | greedy | reinforce | likelihood | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria likelihood --data-type dna |
| dna | normal | greedy | qlearning | likelihood | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria likelihood --data-type dna |
| dna | normal | greedy | reinforce | parsimony | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria parsimony --data-type dna |
| dna | normal | greedy | qlearning | parsimony | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria parsimony --data-type dna |
| dna | likelihood | greedy | reinforce | likelihood | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD |
| dna | likelihood | greedy | qlearning | likelihood | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD |
| dna | likelihood | beam | reinforce | likelihood | python infer.py --ckp-dir {ckp_dir} --search beam --network-type reinforce --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD |
| dna | likelihood | beam | qlearning | likelihood | python infer.py --ckp-dir {ckp_dir} --search beam --network-type q_learning --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD |
| dna | likelihood | iqtree | reinforce | likelihood | python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type reinforce --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD |
| dna | likelihood | iqtree | qlearning | likelihood | python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type q_learning --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD |
| dna | parsimony | greedy | reinforce | parsimony | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY |
| dna | parsimony | greedy | qlearning | parsimony | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY |
| dna | parsimony | beam | reinforce | parsimony | python infer.py --ckp-dir {ckp_dir} --search beam --network-type reinforce --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY |
| dna | parsimony | beam | qlearning | parsimony | python infer.py --ckp-dir {ckp_dir} --search beam --network-type q_learning --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY |
| dna | parsimony | iqtree | reinforce | parsimony | python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type reinforce --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY |
| dna | parsimony | iqtree | qlearning | parsimony | python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type q_learning --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY |
| protein | normal | greedy | reinforce | likelihood | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria likelihood --data-type protein |
| protein | normal | greedy | qlearning | likelihood | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria likelihood --data-type protein |
| protein | normal | greedy | reinforce | parsimony | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria parsimony --data-type protein |
| protein | normal | greedy | qlearning | parsimony | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria parsimony --data-type protein |
| protein | likelihood | greedy | reinforce | likelihood | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD |
| protein | likelihood | greedy | qlearning | likelihood | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD |
| protein | likelihood | beam | reinforce | likelihood | python infer.py --ckp-dir {ckp_dir} --search beam --network-type reinforce --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD |
| protein | likelihood | beam | qlearning | likelihood | python infer.py --ckp-dir {ckp_dir} --search beam --network-type q_learning --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD |
| protein | likelihood | iqtree | reinforce | likelihood | python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type reinforce --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD |
| protein | likelihood | iqtree | qlearning | likelihood | python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type q_learning --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD |
| protein | parsimony | greedy | reinforce | parsimony | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY |
| protein | parsimony | greedy | qlearning | parsimony | python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY |
| protein | parsimony | beam | reinforce | parsimony | python infer.py --ckp-dir {ckp_dir} --search beam --network-type reinforce --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY |
| protein | parsimony | beam | qlearning | parsimony | python infer.py --ckp-dir {ckp_dir} --search beam --network-type q_learning --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY |
| protein | parsimony | iqtree | reinforce | parsimony | python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type reinforce --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY |
| protein | parsimony | iqtree | qlearning | parsimony | python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type q_learning --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY |