Conversation
Closes #310 - this just ignores lines that do not start with a prefix we expect (e.g. hifiasm's A-lines, comments, ...) This is partial work on #403. Need to update this to handle E-lines and O-lines in GFA 2 files. These are both a bit tricky because they require some fancy additional logic; we want to ignore containment E-lines (and maybe really all non-dovetail E-lines?), and ideally we want to support O-lines with fancy recursive path definitions. But neither of those is super intractable or anything. Also, we might want to eventually reconsider how we handle inconsistent lengths. TECHNICALLY in GFA 2 you are allowed to override segment length (as given by a sequence) with another length. but like that just seems so dreadful to me that i really doubt that supporting it will do anything but cause problems for us. The all_line_types.gfa2.gfa test case (c/o gfapy) has an example of this which is currently causing a test error, as expected. Anyway ....... At least from some initial testing on the hg002 graph this is already much faster which is encouraging. i'm sure there are ways to speed it up even further Code is kind of sloppy, need to add more tests
currently only exposed within code - should add a cli param i guess
Now that this sanity check is fully configurable, closes #421
probs not very important but whatever at least now it is consistent
Per the GFA 2 specification: https://gfa-spec.github.io/GFA-spec/GFA2.html Matches GfaViz' behavior. This was already tested in the all_line_types.gfa2.gfa test case from Gfapy, but just to be safe I added some tests that explicitly verifies that this stuff works as I intend.
and only visualizing the dovetail edges. not TOO bad to do although i would like to add more tests... and should really tidy up thee code wrt self implying edges to reduce duplication
the way E-lines are expanded in O-lines needs some work - should make it look ahead and avoid adding target, also, if the edge is followed by a segment. but I need to finally commit this so I can stop worrying abt losing this work
Surprisingly this seems to work ok, but I want to abstract it to another function and add a zillion tests because this is surprisingly tricky
worked out surprisingly well!
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #422 +/- ##
==========================================
+ Coverage 65.88% 67.21% +1.32%
==========================================
Files 33 34 +1
Lines 4042 4233 +191
Branches 990 1039 +49
==========================================
+ Hits 2663 2845 +182
- Misses 1308 1314 +6
- Partials 71 74 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
this is not really allowed in the GFA 1 specification, but let's nip it in the bud anyway ...
While putting these together I ran into some confusion about what exactly constitutes a dovetail... see the second test added here for details. Need to discuss, but for now let's accept it ig
I think this makes sense. This makes implementation easy (now we can apply this after parsing, so it is mostly graph independent) and futureproofs this, plus makes this clearer imo.
I think having debug / verbose at the end makes sense
I was CONSIDERING doing this whole thing of detecting edge covs then only retaining the highest cov one but ehhh it gets finicky. see #430 for possible extension if i eventually have free time
using " marks around the input name makes the error messages clearer
tricky, since "".split(",") yields [""] -- NOT []!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Tentatively ready!
Will address #403, #310, #421, #425