Skip to content

Question about Allele_table.txt ref_seq consistency and ONT read QC recommendations #1

@orangeSi

Description

@orangeSi

Dear CrisprLungo authors,

First of all, thank you very much for developing and maintaining CrisprLungo. It is a very useful and well-designed tool for long-read–based CRISPR editing analysis.

I have two questions regarding the output and recommended preprocessing for ONT data:

  1. Consistency of ref_seq in Allele_table.txt

In the output file Allele_table.txt(part output of demo data), the first column (ref_seq) appears to differ between rows.

$cat Allele_table.txt|c1
1                                                             2                                                             3          4      5          6
ref_seq                                                       mut_seq                                                       Raw_count  %      CIGAR      Mutation_info
CACAGGCGCCCTGGCCAGTCGTCTGGGCGGTGCTACAACT                      CACAGGCGCCCTGGCCAGTCGTCTGGGCGGTGCTACAACT                      147        33.72  40M        None
GATCCCACAGGCGCCCTGGCCAGTCGTCTGGGCGGTGCTACAACTGGGCTGGCGGCCA    GATCCCACAGGCGCCCTGGC-------------------ACAACTGGGCTGGCGGCCA    84         19.27  20M19D20M  2364_2382:Del_19
CACAGGCGCCCTGGCCAGTCGTCTGGGCGGTGCTACAACTGGG                   CACAGGCGCCCTGGCCAGTC----GGGCGGTGCTACAACTGGG                   61         13.99  20M4D20M   2369_2372:Del_4
AGATCCCACAGGCGCCCTGGCCAGTCGTCTGGGCGGTGCTACAACTGGGCTGGCGGCC    AGATCCCACAGGCGCCCTGG-------------------TACAACTGGGCTGGCGGCC    58         13.3   20M19D20M  2363_2381:Del_19
CCACAGGCGCCCTGGCCAGTCGTCTGGGCGGTGCTACAACTGGGCTG               CCACAGGCGCCCTGGCCAGT--------CGGTGCTACAACTGGGCTG               57         13.07  20M8D20M   2368_2375:Del_8
CATGCAGATCCCACAGGCGCCCTGGCCAGTCGTCTGGGCGGTGCTACAACTG          CATGCAGATCCCACAGGCGC-------------CTGGGCGGTGCTACAACTG          4          0.92   20M13D20M  2358_2370:Del_13

My understanding is that, similar to what is shown in allele_plot.png, the reference sequence (ignoring indels) should remain identical across all alleles.

Could you please clarify:

Is it expected behavior that ref_seq differs between rows in Allele_table.txt?

Or should the reference sequence be fixed across all rows (aside from alignment gaps introduced by indels)?

  1. Recommended QC / filtering for ONT reads before CrisprLungo

For Oxford Nanopore (ONT) reads, before running CrisprLungo, do you recommend any specific:

quality filtering thresholds (e.g. minimum Q-score), and

preprocessing tools (e.g. NanoFilt, Filtlong, etc.)?

For example, would a filter such as: NanoFilt -q 12

be reasonable, or do you suggest different thresholds or strategies for CRISPR editing analysis with long reads?

Thank you again for making CrisprLungo available to the community. I really appreciate your work and look forward to your guidance.

Best regards,
Si

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions