[HW] HWVectorization Part 4: Partial Vectorization (Chunking) #10399
Open
mafeguimaraes wants to merge 1 commit into
Open
[HW] HWVectorization Part 4: Partial Vectorization (Chunking) #10399mafeguimaraes wants to merge 1 commit into
mafeguimaraes wants to merge 1 commit into
Conversation
4c3484d to
6019b70
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the fourth and last part of a series of patches (#9749, #9704, #9739) for the
hw-vectorizationpass. This patch introduces Partial Vectorization (Chunking). This acts as a fallback when an output vector cannot be entirely vectorized using a single pattern.The pass iterates over un-vectorizable output buses and identifies contiguous sub-ranges (chunks) of bits that share the same vector provenance. It groups these adjacent 1-bit extracts into wider, multi-bit
comb.extractoperations, reducing the fan-in of the finalcomb.concatand cleaning up redundant scalar logic.Example:
In the example below, a tree of scalar logic (masks, shifts, and logic gates) constructs a vector where the top 3 bits are a direct slice of the input (
in[3:1]), while theLSBis a structuralXOR(in[1] ^ in[0]).The pass identifies the contiguous 3-bit chunk, collapsing the scalar tree into a single
i3extract, leaving only the necessary scalar logic for theLSB.