I'm trying to use simpleaf to build an index for Glycine max (soybean). The genome and gtf files required some preprocessing to get them properly formatted.
I ran the following command (using simpleaf 0.16.2):
simpleaf index --output simpleaf_index --fasta ../../g_max.genome.fasta --gtf ../../g_max.longest_transcripts.gtf --rlen 91 --threads 16 --use-piscem
Which resulted in the following output:
2024-06-04T10:05:48.261414Z INFO simpleaf::simpleaf_commands::indexing: preparing to make reference with roers
2024-06-04T10:05:50.342651Z INFO grangers::reader::gtf: Finished parsing the input file. Found 0 comments and 752330 records.
2024-06-04T10:05:51.029383Z INFO roers: Built the Grangers object for 752330 records
2024-06-04T10:05:51.237147Z WARN grangers::grangers_info: The exon_number column contains null values. Will compute the exon number from exon start position .
2024-06-04T10:05:51.527120Z WARN roers: Found missing gene_id and/or gene_name; Imputing. If both missing, will impute using transcript_id; Otherwise, will impute using the existing one.
2024-06-04T10:05:51.549542Z INFO roers: Proceed 278761 exon records from 55589 transcripts
Error: invalid base: 0067
The error message is a bit cryptic, so I don't really know what to do. I tried searching some of the rust repositories but haven't found the error message source yet.
If relevant I can provide the genome and gtf files.
EDIT:
Upon further investigation this seems to stem from the noodles crate: https://github.com/zaeleus/noodles/blob/906f5237c68fc6b04a73010580d3c4fed2c7b66e/noodles-fasta/src/record/sequence/complement.rs#L24. However, I don't really understand what's wrong yet.
Quick python check:
Which should be possible to reverse complement?
I'm trying to use simpleaf to build an index for Glycine max (soybean). The genome and gtf files required some preprocessing to get them properly formatted.
I ran the following command (using
simpleaf 0.16.2):Which resulted in the following output:
The error message is a bit cryptic, so I don't really know what to do. I tried searching some of the rust repositories but haven't found the error message source yet.
If relevant I can provide the genome and gtf files.
EDIT:
Upon further investigation this seems to stem from the noodles crate: https://github.com/zaeleus/noodles/blob/906f5237c68fc6b04a73010580d3c4fed2c7b66e/noodles-fasta/src/record/sequence/complement.rs#L24. However, I don't really understand what's wrong yet.
Quick python check:
Which should be possible to reverse complement?