Skip to content

exma23/final_thesis

Repository files navigation

Deep reinforcement learning framework for phylogenetic inference

A few non-implementation directories in this project (predefined in common.py) should be noticed:

├── ckps/
│   └── {data_type}/
│       └── {criteria}/
|         └── {ckp_id}/
│           ├── best.pt
|           ├── final.pt
|           ├── summary.txt
|           └── setting.json
├── data/
|   └── {data_type}/
│       ├── train/
|       │   └── {tree_id}/
|       │       ├── data.phy
|       │       ├── gt.newick
|       │       ├── gt.newick.log
|       │       └── start.newick
|       └── test/
├── LOCAL/
├── logs/
└── results/
|  └── {data_type}/
│       └── {criteria}/
|               └── {ckp_id}/
|                   └── {search_strategy}/
|                       ├── {tree_id}_best.png
|                       └── {tree_id}.png
├── configs/
|    ├── default_config.yaml
|    ├── config0.yaml
|    ├── config1.yaml
|    └── .....
├──
  • ckps: where the model is saved, ckps_id is a timestamp (when model started being trained)
  • logs: logs when infer and train to look back later
  • results: visualisation of the history of infering a tree, compared to IQTree and RAxML
  • data: has the structure as above
  • setting.json in data_type/criteria/ckps_id: is the replicate of the config file in configs folder, but with additional information in experiment field
    • store basic info about phylo info: data_type (dna or protein), criteria (parsimony or likelihood)
    • chosen_feature (normal or feat_likelihood or feat_parsimony)
    • chosen_network (qlearning or reinforce)
    • chosen_reward (raw, relu or normalized). This must be consistent with the reward type used during inference time.
    • reset_per_epoch (true or false).

1. Environment setup

A few files for configurations that you should take notice:

  • build.sh: to compile C++ code for computing SPR moves
  • config.yaml: default hyperparameters of the project
  • bmeprl.yml: to create conda environment
  • Dockerfile: to create docker image To set up conda environment (if you're already on a server with GPU)
conda env create -f bmeprl.yml
conda activate bmeprl

To set up docker image (if you need to setup on VastAI)

# On your local
docker build -t {user_name}/bmeprl:cu121-v1 .
docker push {user_name}/bmeprl:cu121-v1

# On rented server
source /opt/conda/etc/profile.d/conda.sh
conda activate bmeprl

2. Training

2.1. Generate data

I used Alisim from IQTree to simulate data. Some configurations to be noticed: refer to config_guide.md

python generate.py

2.2. Training

Similarly to generate data: refer to config_guide.md to modify your configurations

python main.py

3. Infer

Similarly to generate data: refer to config_guide.md to modify your configurations

python infer.py

4. Copyright

I took reference from 3 sources:

5. List of useful cmd

5.1. Generate data

python generate.py --data-type dna --split train --n-trees 30
python generate.py --data-type dna --split test --n-trees 6
python generate.py --data-type protein --split train --n-trees 30
python generate.py --data-type protein --split test --n-trees 6

5.2. Build C++ backend

Data Type Criteria Feature CMD build C++
dna likelihood normal bash build.sh "" "" likelihood
dna parsimony normal bash build.sh "" "" parsimony
dna likelihood likelihood bash build.sh FEAT_LIKELIHOOD "" likelihood
dna parsimony parsimony bash build.sh FEAT_PARSIMONY "" parsimony
protein likelihood normal bash build.sh "" FEAT_PROTEIN likelihood
protein parsimony normal bash build.sh "" FEAT_PROTEIN parsimony
protein likelihood likelihood bash build.sh FEAT_LIKELIHOOD FEAT_PROTEIN likelihood
protein parsimony parsimony bash build.sh FEAT_PARSIMONY FEAT_PROTEIN parsimony

5.3. Train RL agent

Data Type Criteria Feature RL Algo CMD train
dna likelihood normal reinforce python main.py --network-type reinforce --criteria likelihood --data-type dna
dna likelihood normal qlearning python main.py --network-type q_learning --criteria likelihood --data-type dna
dna parsimony normal reinforce python main.py --network-type reinforce --criteria parsimony --data-type dna
dna parsimony normal qlearning python main.py --network-type q_learning --criteria parsimony --data-type dna
dna likelihood likelihood reinforce python main.py --network-type reinforce --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD
dna likelihood likelihood qlearning python main.py --network-type q_learning --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD
dna parsimony parsimony reinforce python main.py --network-type reinforce --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY
dna parsimony parsimony qlearning python main.py --network-type q_learning --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY
protein likelihood normal reinforce python main.py --network-type reinforce --criteria likelihood --data-type protein
protein likelihood normal qlearning python main.py --network-type q_learning --criteria likelihood --data-type protein
protein parsimony normal reinforce python main.py --network-type reinforce --criteria parsimony --data-type protein
protein parsimony normal qlearning python main.py --network-type q_learning --criteria parsimony --data-type protein
protein likelihood likelihood reinforce python main.py --network-type reinforce --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD
protein likelihood likelihood qlearning python main.py --network-type q_learning --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD
protein parsimony parsimony reinforce python main.py --network-type reinforce --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY
protein parsimony parsimony qlearning python main.py --network-type q_learning --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY

5.4. Inference

Example: run inference for 2 test trees (tree_id=0 and 95) using DNA data

python infer.py \
  --ckp-dir /root/final_thesis/ckps/dna/likelihood/reinforce/260420_194333 \
  --data-dir /root/final_thesis/data/dna/test \
  --tree-id 0 95 \
  --search monotonic

Using protein data:

python infer.py \
  --ckp-dir /root/final_thesis/ckps/protein/likelihood/reinforce/260420_194333 \
  --data-dir /root/final_thesis/data/protein/test \
  --tree-id 0 95 \
  --search monotonic

Nếu muốn chạy cho tất cả cây trong thư mục test, chỉ cần bỏ --tree-id, script sẽ tự động lấy toàn bộ tree_id có trong thư mục.


Data Type Feature Search RL Algo Criteria CMD infer
dna normal greedy reinforce likelihood python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria likelihood --data-type dna
dna normal greedy qlearning likelihood python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria likelihood --data-type dna
dna normal greedy reinforce parsimony python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria parsimony --data-type dna
dna normal greedy qlearning parsimony python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria parsimony --data-type dna
dna likelihood greedy reinforce likelihood python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD
dna likelihood greedy qlearning likelihood python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD
dna likelihood beam reinforce likelihood python infer.py --ckp-dir {ckp_dir} --search beam --network-type reinforce --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD
dna likelihood beam qlearning likelihood python infer.py --ckp-dir {ckp_dir} --search beam --network-type q_learning --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD
dna likelihood iqtree reinforce likelihood python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type reinforce --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD
dna likelihood iqtree qlearning likelihood python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type q_learning --criteria likelihood --data-type dna --feature_type FEAT_LIKELIHOOD
dna parsimony greedy reinforce parsimony python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY
dna parsimony greedy qlearning parsimony python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY
dna parsimony beam reinforce parsimony python infer.py --ckp-dir {ckp_dir} --search beam --network-type reinforce --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY
dna parsimony beam qlearning parsimony python infer.py --ckp-dir {ckp_dir} --search beam --network-type q_learning --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY
dna parsimony iqtree reinforce parsimony python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type reinforce --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY
dna parsimony iqtree qlearning parsimony python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type q_learning --criteria parsimony --data-type dna --feature_type FEAT_PARSIMONY
protein normal greedy reinforce likelihood python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria likelihood --data-type protein
protein normal greedy qlearning likelihood python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria likelihood --data-type protein
protein normal greedy reinforce parsimony python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria parsimony --data-type protein
protein normal greedy qlearning parsimony python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria parsimony --data-type protein
protein likelihood greedy reinforce likelihood python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD
protein likelihood greedy qlearning likelihood python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD
protein likelihood beam reinforce likelihood python infer.py --ckp-dir {ckp_dir} --search beam --network-type reinforce --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD
protein likelihood beam qlearning likelihood python infer.py --ckp-dir {ckp_dir} --search beam --network-type q_learning --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD
protein likelihood iqtree reinforce likelihood python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type reinforce --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD
protein likelihood iqtree qlearning likelihood python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type q_learning --criteria likelihood --data-type protein --feature_type FEAT_LIKELIHOOD
protein parsimony greedy reinforce parsimony python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type reinforce --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY
protein parsimony greedy qlearning parsimony python infer.py --ckp-dir {ckp_dir} --search monotonic --network-type q_learning --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY
protein parsimony beam reinforce parsimony python infer.py --ckp-dir {ckp_dir} --search beam --network-type reinforce --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY
protein parsimony beam qlearning parsimony python infer.py --ckp-dir {ckp_dir} --search beam --network-type q_learning --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY
protein parsimony iqtree reinforce parsimony python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type reinforce --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY
protein parsimony iqtree qlearning parsimony python infer.py --ckp-dir {ckp_dir} --search iqtree --network-type q_learning --criteria parsimony --data-type protein --feature_type FEAT_PARSIMONY

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages