Perf: Bundle high-impact optimizations#606
Open
svdrecbd wants to merge 4 commits intopinellolab:masterfrom
Open
Perf: Bundle high-impact optimizations#606svdrecbd wants to merge 4 commits intopinellolab:masterfrom
svdrecbd wants to merge 4 commits intopinellolab:masterfrom
Conversation
- Speed up reverse_complement using str.translate (~3.4x faster). - Optimize find_indels_substitutions using set lookups. - Implement Banded Needleman-Wunsch alignment in CRISPResso2Align.pyx (~2.4x alignment speedup). - Add --needleman_wunsch_bandwidth argument (defaults to -1/off for parity). - Optimize multiprocessing by pre-building key lists. - Add integration tests for bit-exact output verification. - Add performance benchmark script. Note: This commit includes updated C extension files (*.c) generated by Cython, accounting for the large diff size.
kclem
reviewed
Jan 30, 2026
| return | ||
|
|
||
| variant_file_path = os.path.join(variants_dir, f"variants_{process_id}.pkl") | ||
| with open(variant_file_path, 'wb') as file: |
Author
There was a problem hiding this comment.
Oh right. I updated that block to use pickle as a fallback when I refactored the multiprocessing logic, but since result_queue is now always provided, this file-writing path is indeed dead code. I've removed it along with the unused pickle import and updated the docstrings to reflect the queue-based architecture.
Thank you for catching this!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Question on Parity & Future Work:
For this PR, I prioritized optimizations that preserve 100% bit-exact consistency with the existing codebase. However, I identified other potential strategies, such as K-mer based pre-filtering for references or swapping the alignment backend, that could yield even larger performance gains. These might introduce negligible differences in edge cases (e.g., handling of extremely poor alignments).
Is strict bit-exact parity a hard requirement for the project, or would you (or whatever entity is responsible for this repo) be open to exploring these kinds of trade-offs in future PRs?
Note on Diff Size: This PR includes updated C extension files (*.c) generated by Cython (CRISPResso2Align.c, etc.). These are tracked in the repo to simplify installation, but they account for the majority of the +16k line diff. The actual logic changes are in the .pyx and .py files.