Skip to content

[HW] HWVectorization Part 4: Partial Vectorization (Chunking) #10399

Open
mafeguimaraes wants to merge 1 commit into
llvm:mainfrom
mafeguimaraes:hw-vec-part4
Open

[HW] HWVectorization Part 4: Partial Vectorization (Chunking) #10399
mafeguimaraes wants to merge 1 commit into
llvm:mainfrom
mafeguimaraes:hw-vec-part4

Conversation

@mafeguimaraes
Copy link
Copy Markdown
Contributor

This is the fourth and last part of a series of patches (#9749, #9704, #9739) for the hw-vectorization pass. This patch introduces Partial Vectorization (Chunking). This acts as a fallback when an output vector cannot be entirely vectorized using a single pattern.

The pass iterates over un-vectorizable output buses and identifies contiguous sub-ranges (chunks) of bits that share the same vector provenance. It groups these adjacent 1-bit extracts into wider, multi-bit comb.extract operations, reducing the fan-in of the final comb.concat and cleaning up redundant scalar logic.

Example:

In the example below, a tree of scalar logic (masks, shifts, and logic gates) constructs a vector where the top 3 bits are a direct slice of the input (in[3:1]), while the LSB is a structural XOR (in[1] ^ in[0]).

The pass identifies the contiguous 3-bit chunk, collapsing the scalar tree into a single i3 extract, leaving only the necessary scalar logic for the LSB.

// Before
hw.module @with_logic_gate(in %in : i4, out out : i4) {
  %c0_i2 = hw.constant 0 : i2
  %false = hw.constant false
  %c7_i4 = hw.constant 7 : i4
  %c-5_i4 = hw.constant -5 : i4
  %c0_i3 = hw.constant 0 : i3
  
  // Scalar masking and shifting tree
  %0 = comb.concat %c0_i3, %13 : i3, i1
  %1 = comb.concat %c0_i2, %11, %false : i2, i1, i1
  %2 = comb.or %1, %0 : i4
  %3 = comb.and %2, %c-5_i4 : i4
  %4 = comb.concat %false, %10, %c0_i2 : i1, i1, i2
  %5 = comb.or %4, %3 : i4
  %6 = comb.and %5, %c7_i4 : i4
  %7 = comb.concat %9, %c0_i3 : i1, i3
  %8 = comb.or %7, %6 : i4
  
  // Individual bit extracts
  %9 = comb.extract %in from 3 : (i4) -> i1
  %10 = comb.extract %in from 2 : (i4) -> i1
  %11 = comb.extract %in from 1 : (i4) -> i1
  %12 = comb.extract %in from 0 : (i4) -> i1
  %13 = comb.xor %11, %12 : i1
  
  hw.output %8 : i4
}
//After
hw.module @with_logic_gate(in %in : i4, out out : i4) {
  // Scalar logic preserved for bit 0
  %0 = comb.extract %in from 1 : (i4) -> i1
  %1 = comb.extract %in from 0 : (i4) -> i1
  %2 = comb.xor %0, %1 : i1
  
  // Bits [3:1] grouped into a single multi-bit chunk
  %3 = comb.extract %in from 1 : (i4) -> i3
  
  // Final simplified concatenation
  %4 = comb.concat %3, %2 : i3, i1
  hw.output %4 : i4
}

@mafeguimaraes mafeguimaraes requested a review from darthscsi as a code owner May 6, 2026 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant