This repository provides a comprehensive Nextstrain analysis of Coxsackievirus A6. You can choose to perform either a VP1 run (>=600 base pairs) or a whole genome run (>=6400 base pairs).
For those unfamiliar with Nextstrain or needing installation guidance, please refer to the Nextstrain documentation.
Most of the data for this analysis can be obtained from NCBI Virus. Instructions for downloading sequences are provided at the end of this README under Sequences. Some data here are private contributions from several clinics and authors in China, France and the UK. If you have relevant data such as sequences, patient age, spatial data and clinical outcomes and are willing to share, please contact me at nadia.neuner-jehle@swisstph.ch.
This repository includes the following directories and files:
ingest: Contains Python scripts and thesnakefilefor automatic downloading of CVA6 sequences and metadata.scripts: Custom Python scripts called by thesnakefile.snakefile: The entire computational pipeline, managed using Snakemake. Snakemake documentation can be found here.vp1: Sequences and configuration files for the VP1 run.whole_genome: Sequences and configuration files for the whole genome run.
The config, vp1/config, and whole_genome/config directories contain necessary configuration files:
colors.tsv: Color schemegeo_regions.tsv: Geographical locationslat_longs.tsv: Latitude datadropped_strains.txt: Dropped strainsclades_genome.tsv: Virus clade assignmentsreference_sequence.gb: Reference sequenceauspice_config.json: Auspice configuration file
The reference sequence used is Gdula, accession number AY421764, sampled in 1949.
Install the Nextstrain environment by following these instructions.
Activate the Nextstrain environment:
conda activate nextstrain
To perform a build, run:
snakemake --cores 1
For specific builds:
- VP1 build:
snakemake auspice/cv_a6_vp1.json --cores 1 - Whole genome build:
snakemake auspice/cv_a6_whole_genome.json --cores 1
To run the ingest, you will need some specific reference files, such as a reference.fasta or annotation.gff3 file.
- In the
configfile: check that the taxid is correct - To get these files you have to run the script generate_from_genbank.py manually.
If you want to have two reference files for whole genome and VP1, you can choose a similar way:
> python3 ingest/bin/generate_from_genbank.py --reference "AY421764.1" --output-dir "whole_genome/config/"- You need to specify a few things: [0];[product];[2].
- It will create the files in the subdirectory
data/references. - These files will be used by the
ingestsnakefile.
- Check that the
attributesindata/references/pathogen.jsonare up to date. - Run the
ingestsnakefile (either manually or using the main snakefile).- Depending on your system you may need to run
chmod +x ./vendored/*; chmod +x ./bin/*first.
- Depending on your system you may need to run
- Run the main snakefile.
To visualize the build, use Auspice:
auspice view --datasetDir auspice
To run two visualizations simultaneously, you may need to set the port:
export PORT=4001
Sequences can be downloaded manually or automatically.
- Manual Download: Visit NCBI Virus, search for
CVA6or Taxid86107, and download the sequences. - Automated Download: The
ingestfunctionality, included in the mainsnakefile, handles automatic downloading.
The ingest pipeline is based on the Nextstrain RSV ingest workflow. Running the ingest pipeline produces data/metadata.tsv and data/sequences.fasta.
This repository uses git subrepo to manage copies of ingest scripts in ingest/vendored. To pull new changes from the central ingest repository, first install git subrepo and then follow the instructions in ingest/vendored/README.md.
For questions or comments, contact me via GitHub or nadia.neuner-jehle@swisstph.ch.