Add READSUBMIT workflow#58
Conversation
|
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.5.1. For more documentation on how to update your pipeline, please see the Synchronisation documentation. |
|
| --input samplesheet_reads.csv \ | ||
| --submission_study <your_study> \ | ||
| --webincli_mode submit \ | ||
| --test_upload true \ |
There was a problem hiding this comment.
This can be just a flag, it doesn't need to have a value (the presence should be enough --test_upload with no value)
There was a problem hiding this comment.
Yes, but here it's just for transparency
mberacochea
left a comment
There was a problem hiding this comment.
I know this is not ready, but left some notes
|
Alright, another way to think about this is to have the whole
samplesheet to belong to the same webin.
I'm thinking about MAGs for example, having loads of them and having to
run many seqsubmits will make it harder -- chaining pipelines becomes a
bit trickier if the study is an argument instead of a samplesheet element.
On 12/05/2026 12:01, Ekaterina Sakharova wrote:
***@***.**** commented on this pull request.
------------------------------------------------------------------------
In workflows/readsubmit.nf
<#58 (comment)>:
> +include { paramsSummaryMultiqc } from '../subworkflows/nf-core/utils_nfcore_pipeline'
+include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline'
+include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_seqsubmit_pipeline'
+
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ RUN THE WORKFLOW
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+*/
+
+workflow READSUBMIT {
+
+ take:
+ ch_samplesheet // channel: samplesheet read in from --input
+ submission_study // val: accession of the study to submit to (optional)
+ study_metadata // val: path to study metadata file for study creation (used if no submission_study provided)
I think we should not add it into samplesheet.
It is better to limit pipeline to be run for 1 study at a time.
Different studies have different owners (webin accounts). Then we will
need to ask for those in samplesheet as well (?) It seems very
complicated to me. That is why we have 1 study for everything
submitted in one run. If you need to submit to another study - run
seqsubmit again with different arguments.
—
Reply to this email directly, view it on GitHub
<#58 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAISMOOC4XA27K2UTMNNDBL42MACDAVCNFSM6AAAAACYKRXLYKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DENZRGYYDONRQGY>.
You are receiving this because you commented.Message ID:
***@***.***>
--
Martin Beracochea
MGnify Production Project Leader
Microbiome Informatics
European Bioinformatics Institute (EMBL-EBI)
|
|
Ideally, you will need to check access for all provided studies then (do they belong to one account). Because if they are not - then pipeline will crash on the last step. We also have a step for study registration. I will not expect people having a study already registered. So, I expect study argument/column being empty for majority of submissions (if we talk about external users). |
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
|
I’ve been working on making the submit_study.py script more robust and specifically focused on early header validation and structured logging updates. My goal was to check that if the pipeline fails early on than there is a clear error message if the input CSV/TSV is malformed, rather than crashing downstream. I've verified these changes locally using the mag_no_coverage_paired_reads.nf.test suite, and the test run completed successfully in ~79s on my WSL2 environment. I am still early in my bioinformatics journey, I would appreciate feedback on the Python syntax to make sure it aligns with nf-core's best practices, but the current implementation is functional and passes all local tests. I couldn't push directly to the branch, but you can see the changes here: https://github.com/riceroni18/seqsubmit/blob/dev/bin/submit_study.py |
|
Hi @riceroni18, thank you for your message! If you want a feedback from us, please, create a separate PR from your fork :) |
…mand. Add module tests.
… (from mgnify-pipelines-toolkit) for handling reads as well as genomes/assemblies
|
I've added some extra samplesheet validation rules based on ENA docs. I'm not sure how much we want to support this and how ENA specific we want to make it. But since we're using webin-cli it seems like it's ENA only for now. The rules are all from https://ena-docs.readthedocs.io/en/latest/submit/reads/webin-cli.html
|
… This will not affect nextflow behavior but does permit some standalone use of the script
…'t change any behavior of this module.
Resolves to #28
I was able to actually submit reads with this one, but it still requires some work:
https://github.com/nf-core/test-datasets/raw/modules/data/genomics/prokaryotes/bacteroides_fragilis/illumina/fastq/don't pass webin cli validation. It's required to find suitable reads in nf-datasets and generate snapshotsPR checklist
nf-core pipelines lint).nextflow run . -profile test,docker --outdir <OUTDIR>).nextflow run . -profile debug,test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).