Replaced SpMV with SpMVOp in PDHG for double precision by Bubullzz · Pull Request #989 · NVIDIA/cuopt

Bubullzz · 2026-03-23T14:24:55Z

Description

Replaced SpMV calls with SpMVOp calls in PDHG in order to accelerate compute. These changes are only available for double precision. Benchmarks on B200 full Mittleman dataset with concurrent strategy gave a 1.09x speedup.

Issue

closes #9

…bnvjitlink-dev manually

copy-pr-bot · 2026-03-23T14:24:59Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-03-23T14:36:11Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Bumps CUDA matrix to 13.2 and adds libnvjitlink-dev; defines CUDA_VER_13_2_UP and CUSPARSE_ENABLE_EXPERIMENTAL_API; adds cuSPARSE SpMVOp buffers, descriptor/plan wrapper types and lifecycle, and routes selected PDHG/PDLP SpMV paths to use cuSPARSE SpMVOp when applicable.

Changes

Cohort / File(s)	Summary
Conda environment files `conda/environments/all_cuda-129_arch-aarch64.yaml`, `conda/environments/all_cuda-129_arch-x86_64.yaml`, `conda/environments/all_cuda-132_arch-aarch64.yaml`, `conda/environments/all_cuda-132_arch-x86_64.yaml`	Added `libnvjitlink-dev` to environment dependencies; bumped/renamed CUDA 13.1 → 13.2 environment files where applicable.
Dependencies matrix `dependencies.yaml`	Updated CUDA matrix and `cuda-version` mapping from `13.1` → `13.2`; added `libnvjitlink-dev` to common CUDA package list.
Build configuration `cpp/CMakeLists.txt`	Added public compile definition `CUSPARSE_ENABLE_EXPERIMENTAL_API` to `cuopt` target.
cuSPARSE view header `cpp/src/pdlp/cusparse_view.hpp`	Added `CUDA_VER_13_2_UP` macro; declared SpMVOp descriptor/plan wrapper classes and members; added `create_spmv_op_plans(bool)` declaration and new external-buffer members (CUDA 13.2+ gated).
cuSPARSE view implementation `cpp/src/pdlp/cusparse_view.cu`	Implemented wrapper destructors/creators and `create_spmv_op_plans(bool)`; initialized new external buffers in constructors and manage descriptor/plan lifecycle under CUDA 13.2+ guard.
PDHG SpMV paths `cpp/src/pdlp/pdhg.cu`	Added explicit non-mixed double-precision branch to call `cusparseSpMVOp(...)` (with `cusparseSetStream(...)`) under `CUDA_VER_13_2_UP`; retained fallback to existing `raft::sparse::detail::cusparsespmv(...)` for other cases.
PDLP solver orchestration `cpp/src/pdlp/pdlp.cu`	After scaling, conditionally calls `create_spmv_op_plans(...)` for `double` (non-batch, mixed-precision disabled) before updating mixed-precision matrices.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description clearly explains the change (replacing SpMV with SpMVOp in PDHG), specifies the scope (double precision only), and provides performance metrics (1.09x speedup), directly relating to the changeset.
Linked Issues check	✅ Passed	The PR successfully implements SpMV acceleration for pdlp by replacing SpMV calls with SpMVOp calls [`#9`], adding necessary CUDA 13.2 support, wrapper types, and conditional logic in the PDHG solver.
Out of Scope Changes check	✅ Passed	All code changes are directly related to SpMV-to-SpMVOp replacement for double precision PDHG acceleration. CUDA environment updates and the new wrapper/descriptor structures support the core objective without introducing unrelated modifications.
Title check	✅ Passed	The title accurately describes the main change: replacing SpMV with SpMVOp in PDHG for double precision, which aligns with the PR's core objective and changes across multiple C++ source files and environment configurations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

cpp/src/pdlp/pdlp.cu (1)

2146-2148: Skip SpMVOp plan setup for batch solves.

compute_At_y() and compute_A_x() only enter the SpMVOp path under !batch_mode_; batch PDHG stays on SpMM. As written, double batch runs still pay the plan/JIT setup cost here even though those plans are never used.

♻️ Suggested guard

-  if (!pdhg_solver_.get_cusparse_view().mixed_precision_enabled_ && !std::is_same_v<f_t, float>)
-    pdhg_solver_.get_cusparse_view().create_spmv_op_plans(
-      settings_.hyper_params.use_reflected_primal_dual);
+  if constexpr (std::is_same_v<f_t, double>) {
+    if (!batch_mode_ && !pdhg_solver_.get_cusparse_view().mixed_precision_enabled_) {
+      pdhg_solver_.get_cusparse_view().create_spmv_op_plans(
+        settings_.hyper_params.use_reflected_primal_dual);
+    }
+  }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/pdlp.cu` around lines 2146 - 2148, The SpMV plan/JIT setup is
being run even for batch solves; modify the conditional that calls
pdhg_solver_.get_cusparse_view().create_spmv_op_plans(...) to also check that
batch_mode_ is false (i.e., only create plans when not batch mode). Concretely,
update the current if that checks
pdhg_solver_.get_cusparse_view().mixed_precision_enabled_ and
!std::is_same_v<f_t, float> to additionally require !batch_mode_ before invoking
create_spmv_op_plans(...), leaving the existing mixed_precision and type checks
and the use of settings_.hyper_params.use_reflected_primal_dual unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/CMakeLists.txt`:
- Around line 281-284: The CMake target_compile_definitions for target "cuopt"
currently marks CUSPARSE_ENABLE_EXPERIMENTAL_API as PUBLIC which leaks the
experimental cuSPARSE opt-in to downstream consumers; change the
target_compile_definitions call that references
"CUSPARSE_ENABLE_EXPERIMENTAL_API" so that this definition is declared PRIVATE
(leave the CUOPT_LOG_ACTIVE_LEVEL definition as PUBLIC) to limit the
experimental API flag to internal implementation files (e.g., cpp/src/pdlp/ and
cpp/src/barrier/).

In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1159-1222: The new direct calls to cusparseSpMVOp_bufferSize /
cusparseSpMVOp_createDescr / cusparseSpMVOp_createPlan in create_spmv_op_plans
bypass the existing runtime-dynamic loading used for cusparseSpMV_preprocess and
will fail on older libcusparse; update create_spmv_op_plans (and use of
spmv_op_descr_A_t_, spmv_op_plan_A_t_, spmv_op_descr_A_, spmv_op_plan_A_) to
either (A) dynamically resolve those three symbols with dlsym at runtime and
fall back to the existing cusparsespmv path when resolution fails, or (B) gate
the calls behind a runtime check that enforces CUDA/cuSPARSE 13.1+ and otherwise
use the fallback implementation; implement one of these approaches so symbol
resolution does not depend on build-time linkage.

In `@cpp/src/pdlp/cusparse_view.hpp`:
- Around line 174-181: The SpMVOp handles spmv_op_descr_A_, spmv_op_descr_A_t_,
spmv_op_plan_A_, and spmv_op_plan_A_t_ are never destroyed; add explicit cleanup
by either wrapping them in RAII types (similar to
cusparse_dn_vec_descr_wrapper_t) or implementing a destructor/reset method that
checks for non-null handles and calls cusparseSpMVOp_destroyDescr() for
descriptors and cusparseSpMVOp_destroyPlan() for plans; ensure handles are
initialized to nullptr on construction and that create_spmv_op_plans() invokes
the reset/cleanup before creating new handles to avoid leaks and
double-creation.

---

Nitpick comments:
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 2146-2148: The SpMV plan/JIT setup is being run even for batch
solves; modify the conditional that calls
pdhg_solver_.get_cusparse_view().create_spmv_op_plans(...) to also check that
batch_mode_ is false (i.e., only create plans when not batch mode). Concretely,
update the current if that checks
pdhg_solver_.get_cusparse_view().mixed_precision_enabled_ and
!std::is_same_v<f_t, float> to additionally require !batch_mode_ before invoking
create_spmv_op_plans(...), leaving the existing mixed_precision and type checks
and the use of settings_.hyper_params.use_reflected_primal_dual unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b77457bd-c443-4b34-b8ea-78c3db138061

📥 Commits

Reviewing files that changed from the base of the PR and between 2000028 and 3ca7f55.

📒 Files selected for processing (10)

conda/environments/all_cuda-129_arch-aarch64.yaml
conda/environments/all_cuda-129_arch-x86_64.yaml
conda/environments/all_cuda-132_arch-aarch64.yaml
conda/environments/all_cuda-132_arch-x86_64.yaml
cpp/CMakeLists.txt
cpp/src/pdlp/cusparse_view.cu
cpp/src/pdlp/cusparse_view.hpp
cpp/src/pdlp/pdhg.cu
cpp/src/pdlp/pdlp.cu
dependencies.yaml

coderabbitai · 2026-03-23T14:36:15Z

cpp/CMakeLists.txt

+target_compile_definitions(cuopt
+  PUBLIC "CUOPT_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_${LIBCUOPT_LOGGING_LEVEL}"
+  PUBLIC CUSPARSE_ENABLE_EXPERIMENTAL_API
+)


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n '#include <pdlp/cusparse_view.hpp>|#include "pdlp/cusparse_view.hpp"|cusparseSpMVOp|CUSPARSE_ENABLE_EXPERIMENTAL_API' cpp || true

Repository: NVIDIA/cuopt

Length of output: 2290

🏁 Script executed:

cat -n cpp/CMakeLists.txt | head -300 | tail -50

Repository: NVIDIA/cuopt

Length of output: 1643

🏁 Script executed:

# Search for target definitions and barrier-related compilation rg -n 'add_library|add_executable|target_sources.*barrier' cpp/CMakeLists.txt | head -30

Repository: NVIDIA/cuopt

Length of output: 335

🏁 Script executed:

# Check if barrier files are part of the cuopt target rg -B5 -A5 'src/barrier' cpp/CMakeLists.txt | head -50

Repository: NVIDIA/cuopt

Length of output: 38

🏁 Script executed:

rg -n '#include.*cusparse_view' cpp/include/ || true

Repository: NVIDIA/cuopt

Length of output: 38

🏁 Script executed:

# Check what's exported from public headers fd -e hpp cpp/include | head -20

Repository: NVIDIA/cuopt

Length of output: 405

🏁 Script executed:

fd . cpp/include --type f

Repository: NVIDIA/cuopt

Length of output: 2200

🏁 Script executed:

# Check if any public headers include pdlp/cusparse_view.hpp rg -r '$0' cpp/include | rg 'cusparse_view|cusparseSpMVOp' || true

Repository: NVIDIA/cuopt

Length of output: 38

🏁 Script executed:

# Check if any public headers include files from cpp/src/pdlp rg '#include.*pdlp' cpp/include/ || true

Repository: NVIDIA/cuopt

Length of output: 2265

🏁 Script executed:

# Also verify that the macro definition CUSPARSE_ENABLE_EXPERIMENTAL_API is only needed during compilation, not in public headers rg -B2 -A2 'CUSPARSE_ENABLE_EXPERIMENTAL_API' cpp/include/ || echo "No mentions in public headers"

Repository: NVIDIA/cuopt

Length of output: 85

Scope CUSPARSE_ENABLE_EXPERIMENTAL_API as PRIVATE instead of PUBLIC.

The experimental cuSPARSE API is used only in internal implementation files under cpp/src/pdlp/ (and cpp/src/barrier/ which includes them). Public headers do not directly or transitively include cusparse_view.hpp, so exporting this macro as PUBLIC leaks an experimental CUDA opt-in to downstream consumers and install exports unnecessarily. Change it to PRIVATE:

target_compile_definitions(cuopt PUBLIC "CUOPT_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_${LIBCUOPT_LOGGING_LEVEL}" - PUBLIC CUSPARSE_ENABLE_EXPERIMENTAL_API + PRIVATE CUSPARSE_ENABLE_EXPERIMENTAL_API )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

target_compile_definitions(cuopt

PUBLIC "CUOPT_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_${LIBCUOPT_LOGGING_LEVEL}"

PUBLIC CUSPARSE_ENABLE_EXPERIMENTAL_API

)

target_compile_definitions(cuopt

PUBLIC "CUOPT_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_${LIBCUOPT_LOGGING_LEVEL}"

PRIVATE CUSPARSE_ENABLE_EXPERIMENTAL_API

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/CMakeLists.txt` around lines 281 - 284, The CMake target_compile_definitions for target "cuopt" currently marks CUSPARSE_ENABLE_EXPERIMENTAL_API as PUBLIC which leaks the experimental cuSPARSE opt-in to downstream consumers; change the target_compile_definitions call that references "CUSPARSE_ENABLE_EXPERIMENTAL_API" so that this definition is declared PRIVATE (leave the CUOPT_LOG_ACTIVE_LEVEL definition as PUBLIC) to limit the experimental API flag to internal implementation files (e.g., cpp/src/pdlp/ and cpp/src/barrier/).

coderabbitai · 2026-03-23T14:36:15Z

cpp/src/pdlp/cusparse_view.cu

+// Creates SpMVOp plans. Must be called after scale_problem() so plans use the scaled matrix.
+template <typename i_t, typename f_t>
+void cusparse_view_t<i_t, f_t>::create_spmv_op_plans(bool is_reflected)
+{
+  // Prepare buffers for At_y SpMVOp
+  size_t buffer_size_transpose = 0;
+  RAFT_CUSPARSE_TRY(cusparseSpMVOp_bufferSize(handle_ptr_->get_cusparse_handle(),
+                                              CUSPARSE_OPERATION_NON_TRANSPOSE,
+                                              A_T,
+                                              dual_solution,
+                                              current_AtY,
+                                              current_AtY,
+                                              CUDA_R_64F,
+                                              &buffer_size_transpose));
+  buffer_transpose_spmvop.resize(buffer_size_transpose, handle_ptr_->get_stream());
+
+  RAFT_CUSPARSE_TRY(cusparseSpMVOp_createDescr(handle_ptr_->get_cusparse_handle(),
+                                               &spmv_op_descr_A_t_,
+                                               CUSPARSE_OPERATION_NON_TRANSPOSE,
+                                               A_T,
+                                               dual_solution,
+                                               current_AtY,
+                                               current_AtY,
+                                               CUDA_R_64F,
+                                               buffer_transpose_spmvop.data()));
+
+  char* lto_buffer       = NULL;
+  size_t lto_buffer_size = 0;
+  RAFT_CUSPARSE_TRY(cusparseSpMVOp_createPlan(handle_ptr_->get_cusparse_handle(),
+                                              spmv_op_descr_A_t_,
+                                              &spmv_op_plan_A_t_,
+                                              lto_buffer,
+                                              lto_buffer_size));
+
+  // Only prepare buffers for A_x if we are using reflected_halpern
+  if (is_reflected) {
+    size_t buffer_size_non_transpose = 0;
+    RAFT_CUSPARSE_TRY(cusparseSpMVOp_bufferSize(handle_ptr_->get_cusparse_handle(),
+                                                CUSPARSE_OPERATION_NON_TRANSPOSE,
+                                                A,
+                                                reflected_primal_solution,
+                                                dual_gradient,
+                                                dual_gradient,
+                                                CUDA_R_64F,
+                                                &buffer_size_non_transpose));
+    buffer_non_transpose_spmvop.resize(buffer_size_non_transpose, handle_ptr_->get_stream());
+
+    RAFT_CUSPARSE_TRY(cusparseSpMVOp_createDescr(handle_ptr_->get_cusparse_handle(),
+                                                 &spmv_op_descr_A_,
+                                                 CUSPARSE_OPERATION_NON_TRANSPOSE,
+                                                 A,
+                                                 reflected_primal_solution,
+                                                 dual_gradient,
+                                                 dual_gradient,
+                                                 CUDA_R_64F,
+                                                 buffer_non_transpose_spmvop.data()));
+
+    RAFT_CUSPARSE_TRY(cusparseSpMVOp_createPlan(handle_ptr_->get_cusparse_handle(),
+                                                spmv_op_descr_A_,
+                                                &spmv_op_plan_A_,
+                                                lto_buffer,
+                                                lto_buffer_size));
+  }
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

According to the official CUDA cuSPARSE documentation, what toolkit/runtime version first provides the experimental cusparseSpMVOp_*API, and is direct linkage against those symbols expected to work whenlibcusparse.so at runtime is older than the build toolkit?

💡 Result:

First toolkit/runtime version that provides the experimental cusparseSpMVOp_* API: CUDA Toolkit 13.1 (cuSPARSE “Release 13.1”) introduced the experimental SpMVOp API. [1] (The API is documented as cusparseSpMVOp() [EXPERIMENTAL] in current cuSPARSE docs. [2])

Will direct linkage work if libcusparse.so at runtime is older than the build toolkit? No—this is not supported/expected to work. The official cuSPARSE docs explicitly state that cuSPARSE is not guaranteed to be backward compatible at the binary level, and that using different versions of the cusparse.h header and the shared library is not supported. [3] In practice, if the runtime libcusparse.so is older than the toolkit you built against, the cusparseSpMVOp_* symbols may be missing and the program can fail to load/resolve symbols.

Sources:
[1] CUDA Toolkit 13.1 release notes (cuSPARSE: Release 13.1) (docs.nvidia.cn)
[2] cuSPARSE documentation: cusparseSpMVOp() [EXPERIMENTAL] (docs.nvidia.com)
[3] cuSPARSE documentation: “Compatibility and Versioning” (binary compatibility / mixed header+library versions not supported) (docs.nvidia.com)

Citations:

1: https://docs.nvidia.cn/cuda/cuda-toolkit-release-notes/index.html

2: https://docs.nvidia.com/cuda/cusparse/

3: https://docs.nvidia.com/cuda/cusparse/

Guard SpMVOp API calls behind CUDA 13.1+ runtime compatibility checks.

This file already dynamically loads cusparseSpMV_preprocess via dlsym because build-time and runtime libcusparse.so versions can differ. The new direct calls to cusparseSpMVOp_bufferSize, cusparseSpMVOp_createDescr, and cusparseSpMVOp_createPlan bypass that safeguard. The cuSPARSE SpMVOp API was introduced in CUDA Toolkit 13.1 and direct symbol linkage is not supported when runtime libcusparse.so is older than the build toolkit—the program will fail to resolve symbols. Either enforce CUDA 13.1+ as the minimum supported runtime in build metadata and documentation, or dynamically load these symbols with a fallback to the existing cusparsespmv path to maintain compatibility with older CUDA runtimes.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/cusparse_view.cu` around lines 1159 - 1222, The new direct calls to cusparseSpMVOp_bufferSize / cusparseSpMVOp_createDescr / cusparseSpMVOp_createPlan in create_spmv_op_plans bypass the existing runtime-dynamic loading used for cusparseSpMV_preprocess and will fail on older libcusparse; update create_spmv_op_plans (and use of spmv_op_descr_A_t_, spmv_op_plan_A_t_, spmv_op_descr_A_, spmv_op_plan_A_) to either (A) dynamically resolve those three symbols with dlsym at runtime and fall back to the existing cusparsespmv path when resolution fails, or (B) gate the calls behind a runtime check that enforces CUDA/cuSPARSE 13.1+ and otherwise use the fallback implementation; implement one of these approaches so symbol resolution does not depend on build-time linkage.

cpp/src/pdlp/cusparse_view.hpp

coderabbitai

♻️ Duplicate comments (1)

cpp/src/pdlp/cusparse_view.cu (1)

1163-1221: ⚠️ Potential issue | 🟠 Major

Either enforce a 13.2+ runtime or keep the existing fallback strategy.

#if CUDA_VER_13_2_UP only protects compilation. The same file still uses dynamic_load_runtime for cusparseSpMV_preprocess, so this new path does not inherit the older-runtime behavior that cusparse_view.cu already supports elsewhere. Please either resolve the cusparseSpMVOp_* symbols dynamically too, or add a runtime-version gate that falls back to the existing SpMV path.

#!/bin/bash
set -euo pipefail

echo "Existing runtime-loaded cuSPARSE path:"
rg -n -C2 'dynamic_load_runtime::function|cusparseSpMV_preprocess' cpp/src/pdlp/cusparse_view.cu

echo
echo "New SpMVOp calls that currently bind directly:"
rg -n -C2 'cusparseSpMVOp_(bufferSize|createDescr|createPlan)' cpp/src/pdlp/cusparse_view.cu

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cusparse_view.cu` around lines 1163 - 1221, The new CUDA-13.2+
cusparseSpMVOp_* calls (cusparseSpMVOp_bufferSize, cusparseSpMVOp_createDescr,
cusparseSpMVOp_createPlan) are only compile-gated with `#if` CUDA_VER_13_2_UP but
still link directly at runtime, breaking environments that rely on the file's
existing dynamic loading path (dynamic_load_runtime and
cusparseSpMV_preprocess). Fix by either (A) wiring these three symbols into the
same dynamic loader used elsewhere (add runtime lookups and use those function
pointers instead of direct symbols) or (B) add a runtime-version check that
falls back to the existing cusparseSpMV_preprocess-based code path when the
runtime does not expose the SpMVOp API; update both the A_t path
(spmv_op_descr_A_t_, spmv_op_plan_A_t_) and the reflected A path
(spmv_op_descr_A_, spmv_op_plan_A_) and ensure is_reflected handling still uses
the fallback when needed.

🧹 Nitpick comments (1)

cpp/src/pdlp/cusparse_view.cu (1)
1160-1224: Encode the double-only contract in the helper itself.

This method hardcodes CUDA_R_64F, but cusparse_view_t<int, float> is instantiated in this file at line 1230. The call site in pdlp.cu has a compile-time guard (if constexpr (std::is_same_v<f_t, double>)) that keeps this off the float path today. Adding an if constexpr here would make that contract explicit and local instead of relying on caller discipline, consistent with the type-dispatch pattern already used elsewhere in this file (e.g., lines 225–228, 262–265).
♻️ Suggested guard
 template <typename i_t, typename f_t>
 void cusparse_view_t<i_t, f_t>::create_spmv_op_plans(bool is_reflected)
 {
 `#if` CUDA_VER_13_2_UP
+  if constexpr (!std::is_same_v<f_t, double>) { return; }
+
   // Prepare buffers for At_y SpMVOp
   size_t buffer_size_transpose = 0;
   RAFT_CUSPARSE_TRY(cusparseSpMVOp_bufferSize(handle_ptr_->get_cusparse_handle(),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cusparse_view.cu` around lines 1160 - 1224, The
create_spmv_op_plans method hardcodes CUDA_R_64F despite being templated on f_t;
add an if constexpr(std::is_same_v<f_t, double>) guard around the
CUDA_R_64F-specific cusparseSpMVOp_bufferSize/createDescr/createPlan calls in
cusparse_view_t<i_t,f_t>::create_spmv_op_plans so the double-only contract is
enforced inside the helper (for both the A_T/At_y and the is_reflected A_x
branches); for non-double f_t, skip these double-only calls (or
assert/throw/return) to match the type-dispatch pattern used elsewhere and avoid
relying on callers to gate instantiations.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1163-1221: The new CUDA-13.2+ cusparseSpMVOp_* calls
(cusparseSpMVOp_bufferSize, cusparseSpMVOp_createDescr,
cusparseSpMVOp_createPlan) are only compile-gated with `#if` CUDA_VER_13_2_UP but
still link directly at runtime, breaking environments that rely on the file's
existing dynamic loading path (dynamic_load_runtime and
cusparseSpMV_preprocess). Fix by either (A) wiring these three symbols into the
same dynamic loader used elsewhere (add runtime lookups and use those function
pointers instead of direct symbols) or (B) add a runtime-version check that
falls back to the existing cusparseSpMV_preprocess-based code path when the
runtime does not expose the SpMVOp API; update both the A_t path
(spmv_op_descr_A_t_, spmv_op_plan_A_t_) and the reflected A path
(spmv_op_descr_A_, spmv_op_plan_A_) and ensure is_reflected handling still uses
the fallback when needed.

---

Nitpick comments:
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1160-1224: The create_spmv_op_plans method hardcodes CUDA_R_64F
despite being templated on f_t; add an if constexpr(std::is_same_v<f_t, double>)
guard around the CUDA_R_64F-specific
cusparseSpMVOp_bufferSize/createDescr/createPlan calls in
cusparse_view_t<i_t,f_t>::create_spmv_op_plans so the double-only contract is
enforced inside the helper (for both the A_T/At_y and the is_reflected A_x
branches); for non-double f_t, skip these double-only calls (or
assert/throw/return) to match the type-dispatch pattern used elsewhere and avoid
relying on callers to gate instantiations.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb417ff9-5d4f-44fc-b6a4-d5f552182965

📥 Commits

Reviewing files that changed from the base of the PR and between 3ca7f55 and 2db9c91.

📒 Files selected for processing (4)

cpp/src/pdlp/cusparse_view.cu
cpp/src/pdlp/cusparse_view.hpp
cpp/src/pdlp/pdhg.cu
cpp/src/pdlp/pdlp.cu

🚧 Files skipped from review as they are similar to previous changes (3)

cpp/src/pdlp/pdlp.cu
cpp/src/pdlp/pdhg.cu
cpp/src/pdlp/cusparse_view.hpp

Bubullzz · 2026-03-23T16:03:00Z

will add code rabbit suggestions

Kh4ster · 2026-03-24T09:49:22Z

/ok to test 293a33f

cpp/src/pdlp/cusparse_view.hpp

coderabbitai

♻️ Duplicate comments (1)

cpp/src/pdlp/cusparse_view.cu (1)
1039-1054: ⚠️ Potential issue | 🟠 Major

Keep the new SpMVOp symbols behind the same runtime-compatibility mechanism as the existing preprocess hook.

This TU already uses dlsym for version-sensitive cuSPARSE entry points, but the new cusparseSpMVOp_* create/destroy calls are linked directly. If mixed build/runtime CUDA versions are still a supported deployment mode, this path now bypasses the existing fallback story. Either make the required cuSPARSE runtime explicit or dynamically resolve/fallback here too.
According to the official cuSPARSE documentation, is mixing a newer `cusparse.h` with an older runtime `libcusparse.so` supported, and in which CUDA/cuSPARSE release were the `cusparseSpMVOp_*` APIs used here introduced (including `cusparseSpMVOp_bufferSize`)?
Also applies to: 1176-1241
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cusparse_view.cu` around lines 1039 - 1054, The destructor calls
for the new cuSPARSE SpMVOp APIs (cusparseSpMVOp_destroyPlan,
cusparseSpMVOp_destroyDescr and their create counterparts used elsewhere) are
linked directly and bypass the existing dlsym-based
runtime-compatibility/fallback mechanism; change these to use the same dynamic
resolver (the TU's dlsym-based lookup used for preprocess hooks) or
require/explicitly check for a compatible cuSPARSE runtime before calling, and
mirror that change for the matching create/use sites (e.g., the
cusparseSpMVOp_createPlan/cusparseSpMVOp_createDescr and
cusparseSpMVOp_bufferSize usages around lines 1176-1241) so the code falls back
cleanly when the runtime lacks these symbols.

🧹 Nitpick comments (1)

cpp/src/pdlp/cusparse_view.cu (1)
307-308: Align the version guard with the constructor initializer lists.

These four initializers assume the SpMVOp buffers always exist, but the members themselves are only declared under CUDA_VER_13_2_UP in cpp/src/pdlp/cusparse_view.hpp. If 13.2 is not the hard minimum, this split guard will break older-toolkit builds; if it is, the conditional members just hide the real requirement. Please make that contract consistent in both places.
#!/bin/bash
set -euo pipefail

echo "Conditional declarations vs constructor initializers:"
rg -n -C2 'CUDA_VER_13_2_UP|buffer_(non_transpose|transpose)_spmvop' \
  cpp/src/pdlp/cusparse_view.hpp cpp/src/pdlp/cusparse_view.cu

echo
echo "CUDA/cuSPARSE version expectations declared in the repo:"
rg -n -C2 '13\.2|12\.4|CUSPARSE_ENABLE_EXPERIMENTAL_API|CUDART_VERSION|find_package\(CUDAToolkit' \
  cpp/CMakeLists.txt cpp/src
Also applies to: 721-722, 932-933, 1066-1067
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cusparse_view.cu` around lines 307 - 308, The constructor
initializer list references buffer_non_transpose_spmvop and
buffer_transpose_spmvop but those members are only declared under the
CUDA_VER_13_2_UP guard in the header, causing mismatch; update the guards to be
consistent by moving the CUDA_VER_13_2_UP macro to cover the same places in both
declaration and constructor (either declare the members unconditionally or wrap
the constructor initializers and any uses of buffer_non_transpose_spmvop/
buffer_transpose_spmvop in the same CUDA_VER_13_2_UP guard), and apply the same
fix to the other occurrences you noted (the similar initializer pairs at the
other constructor sites).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1039-1054: The destructor calls for the new cuSPARSE SpMVOp APIs
(cusparseSpMVOp_destroyPlan, cusparseSpMVOp_destroyDescr and their create
counterparts used elsewhere) are linked directly and bypass the existing
dlsym-based runtime-compatibility/fallback mechanism; change these to use the
same dynamic resolver (the TU's dlsym-based lookup used for preprocess hooks) or
require/explicitly check for a compatible cuSPARSE runtime before calling, and
mirror that change for the matching create/use sites (e.g., the
cusparseSpMVOp_createPlan/cusparseSpMVOp_createDescr and
cusparseSpMVOp_bufferSize usages around lines 1176-1241) so the code falls back
cleanly when the runtime lacks these symbols.

---

Nitpick comments:
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 307-308: The constructor initializer list references
buffer_non_transpose_spmvop and buffer_transpose_spmvop but those members are
only declared under the CUDA_VER_13_2_UP guard in the header, causing mismatch;
update the guards to be consistent by moving the CUDA_VER_13_2_UP macro to cover
the same places in both declaration and constructor (either declare the members
unconditionally or wrap the constructor initializers and any uses of
buffer_non_transpose_spmvop/ buffer_transpose_spmvop in the same
CUDA_VER_13_2_UP guard), and apply the same fix to the other occurrences you
noted (the similar initializer pairs at the other constructor sites).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 27d51901-b3a6-4a92-bf2b-56b0d217fabe

📥 Commits

Reviewing files that changed from the base of the PR and between 2db9c91 and 293a33f.

📒 Files selected for processing (2)

cpp/src/pdlp/cusparse_view.cu
cpp/src/pdlp/cusparse_view.hpp

Kh4ster · 2026-03-24T09:55:37Z

cpp/src/pdlp/cusparse_view.hpp

                  const rmm::device_uvector<i_t>&,               // Empty just to init the const&
                  const std::vector<pdlp_climber_strategy_t>&);  // Empty just to init the const&

+  ~cusparse_view_t();


Instead of using a destructor for the whole class, create a wrapper around new cusparseSpMVOpDescr_t and plan with a destructor just like with cusparse_dn_vec_descr_wrapper_t to avoid forgetting destroying futur plan in the cusparse destructor

Kh4ster · 2026-03-24T09:56:26Z

cpp/src/pdlp/cusparse_view.cu

+    RAFT_CUSPARSE_TRY_NO_THROW(cusparseSpMVOp_destroyDescr(spmv_op_descr_A_));
+  }
+#endif
+}


As already stated, create a custom wrapper instead

cpp/src/pdlp/cusparse_view.cu

Kh4ster · 2026-03-24T10:06:34Z

cpp/src/pdlp/cusparse_view.cu

+                                                spmv_op_descr_A_,
+                                                &spmv_op_plan_A_,
+                                                lto_buffer,
+                                                lto_buffer_size));


Make sure it's safe to reuse lto_buffer and lto_buffer_size in the two calls

cpp/src/pdlp/pdhg.cu

rg20 · 2026-03-24T14:43:24Z

@Bubullzz Can you summarize the performance results as well in the PR description?

Bubullzz · 2026-03-24T15:51:20Z

I think it might be better to wait for after the release to merge this PR, we don't have a cuda 13.2 CI which is needed for SpMVOp

chris-maes · 2026-03-24T15:57:53Z

@Iroy30 can you help with CI on this PR?

Bubullzz · 2026-03-24T16:11:00Z

/ok test a086c81

…rs wrappers

Bubullzz added 6 commits March 23, 2026 15:15

replaced SpMV calls with SpMVOp calls in compute_At_y and compute_A_x

66762dc

small fix for full OK tests

539bcaf

coding style and updated dependencies. still need to conda install li…

b9da298

…bnvjitlink-dev manually

updated dependencies.yaml

690b5c8

added comments and removed a useless set_stream

7fd3ab7

moved create_spmv_op_plans and added libnvjitlink-dev

3ca7f55

Bubullzz requested review from a team as code owners March 23, 2026 14:24

Bubullzz added the non-breaking Introduces a non-breaking change label Mar 23, 2026

Bubullzz requested a review from msarahan March 23, 2026 14:24

Bubullzz added the improvement Improves an existing functionality label Mar 23, 2026

Bubullzz requested review from Iroy30, hlinsen and nguidotti March 23, 2026 14:24

Kh4ster requested review from Kh4ster and removed request for Iroy30, hlinsen, msarahan and nguidotti March 23, 2026 14:26

Kh4ster assigned Kh4ster and Bubullzz Mar 23, 2026

Kh4ster added the pdlp label Mar 23, 2026

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

anandhkb added this to the 26.04 milestone Mar 23, 2026

added guards for spmvop compilation

2db9c91

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

Bubullzz closed this Mar 23, 2026

added destructors

293a33f

Bubullzz reopened this Mar 24, 2026

Kh4ster reviewed Mar 24, 2026

View reviewed changes

cpp/src/pdlp/cusparse_view.hpp Show resolved Hide resolved

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

Kh4ster reviewed Mar 24, 2026

View reviewed changes

cpp/src/pdlp/cusparse_view.cu Show resolved Hide resolved

Kh4ster reviewed Mar 24, 2026

View reviewed changes

cpp/src/pdlp/pdhg.cu Outdated Show resolved Hide resolved

Kh4ster reviewed Mar 24, 2026

View reviewed changes

cpp/src/pdlp/pdhg.cu Outdated Show resolved Hide resolved

moved buffer_spmvop to allow <13.2 compilation

a086c81

Bubullzz closed this Mar 24, 2026

Bubullzz reopened this Mar 24, 2026

Bubullzz added 3 commits March 25, 2026 11:38

removed cusparse_view destructor and added spmvop plamd and descripto…

b7820ca

…rs wrappers

added cusparseSetStream for safety

ae6b283

made compute At_y and compute A_x spmv choosing more readable

88693c1

chris-maes added the P2 label Mar 25, 2026

chris-maes changed the title ~~Replaced SpMV with SpMVOp in PDHG for double precision#~~ Replaced SpMV with SpMVOp in PDHG for double precision Mar 25, 2026

Conversation

Bubullzz commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue

Uh oh!

copy-pr-bot bot commented Mar 23, 2026

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Bubullzz commented Mar 23, 2026

Uh oh!

Kh4ster commented Mar 24, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Kh4ster Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Kh4ster Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kh4ster Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rg20 commented Mar 24, 2026

Uh oh!

Bubullzz commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chris-maes commented Mar 24, 2026

Uh oh!

Bubullzz commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Bubullzz commented Mar 23, 2026 •

edited

Loading

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

Bubullzz commented Mar 24, 2026 •

edited

Loading