Skip to content

clinical-genomics-uppsala/TwiMM

Repository files navigation

TwiMM

Full documentation on ReadTheDocs

Code style validation

Lint Snakefmt pycodestyle

Code testing

snakemake dry run integration test pytest

License

License: GPL-3

💬 Introduction

TwiMM is a bioinformatic pipeline designed to analyse hybrid capture long-read (PacBio HiFi) sequencing data from the multiple myeloma gene panel. It detects SNVs/InDels, structural variants (SV), and copy number variants (CNV). SV calling uses three callers in parallel (Severus, PBSV, Sniffles2) whose outputs are merged and annotated with population frequencies via SVDB.

❗ Dependencies

All dependencies are managed by pixi. Install pixi, then run:

pixi install

This resolves and installs all required packages (Python, Snakemake, hydra-genetics, and other tools) as defined in pixi.toml. Container images for individual pipeline tools are pulled automatically at runtime via Singularity/Apptainer and are listed in config/config.yaml.

🎒 Preparations

Sample data

Input data should be added to samples.tsv and units.tsv. The following information need to be added to these files:

Column Id Description
samples.tsv
sample unique sample/patient id, one per row
units.tsv processed and raw BAM files should be in separate units files
sample same sample/patient id as in samples.tsv
type data type identifier (one letter), can be one of Tumor, Normal
platform type of sequencing platform, e.g. PACBIO
machine specific machine id, e.g. Revio
processing_unit ?
barcode sequence library barcode/index or any character string, but not NA
methylation Yes/No
bam path to BAM file

✅ Testing

The pipeline uses pixi for environment management. Run a test dry-run locally:

pixi run test-dry

A small test dataset is also available in .tests/integration/.

🚀 Usage

# Dry run
pixi run all-dry

# Full run (SLURM cluster)
pixi run all-full

Refer to snakemake docs for advanced usage.

Output files

Key output files (see full list):

File Description
results/snv_indels/{sample}_T.phased.include.panel.vep_annotated.vcf.gz Phased SNVs/InDels annotated with VEP
results/cnv_sv/svdb_query/{sample}_T.vcf Merged SVs annotated with population frequencies
results/cnv_sv/cnvkit_vcf/{sample}_T.pathology.annotate_cnv.germline.vcf Annotated CNVs
results/reports/html/{sample}_T.pathology.cnv_report.html Interactive CNV HTML report
results/xlsx_reports/{sample}_T_combined_report.xlsx Excel report (SNV, SV, CNV + Software Versions)

🧑‍⚖️ Rule Graph

rule_graph

About

a pipeline to analyse hybrid capture long sequencing data on multiple myeloma gene panel

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors