Skip to content

Replaced SpMV with SpMVOp in PDHG for double precision#989

Open
Bubullzz wants to merge 12 commits intoNVIDIA:release/26.04from
Bubullzz:spmvop_3
Open

Replaced SpMV with SpMVOp in PDHG for double precision#989
Bubullzz wants to merge 12 commits intoNVIDIA:release/26.04from
Bubullzz:spmvop_3

Conversation

@Bubullzz
Copy link
Copy Markdown

@Bubullzz Bubullzz commented Mar 23, 2026

Description

Replaced SpMV calls with SpMVOp calls in PDHG in order to accelerate compute. These changes are only available for double precision. Benchmarks on B200 full Mittleman dataset with concurrent strategy gave a 1.09x speedup.

Issue

closes #9

@Bubullzz Bubullzz requested review from a team as code owners March 23, 2026 14:24
@Bubullzz Bubullzz added the non-breaking Introduces a non-breaking change label Mar 23, 2026
@Bubullzz Bubullzz requested a review from msarahan March 23, 2026 14:24
@Bubullzz Bubullzz added the improvement Improves an existing functionality label Mar 23, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Kh4ster Kh4ster requested review from Kh4ster and removed request for Iroy30, hlinsen, msarahan and nguidotti March 23, 2026 14:26
@Kh4ster Kh4ster added the pdlp label Mar 23, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Bumps CUDA matrix to 13.2 and adds libnvjitlink-dev; defines CUDA_VER_13_2_UP and CUSPARSE_ENABLE_EXPERIMENTAL_API; adds cuSPARSE SpMVOp buffers, descriptor/plan wrapper types and lifecycle, and routes selected PDHG/PDLP SpMV paths to use cuSPARSE SpMVOp when applicable.

Changes

Cohort / File(s) Summary
Conda environment files
conda/environments/all_cuda-129_arch-aarch64.yaml, conda/environments/all_cuda-129_arch-x86_64.yaml, conda/environments/all_cuda-132_arch-aarch64.yaml, conda/environments/all_cuda-132_arch-x86_64.yaml
Added libnvjitlink-dev to environment dependencies; bumped/renamed CUDA 13.1 → 13.2 environment files where applicable.
Dependencies matrix
dependencies.yaml
Updated CUDA matrix and cuda-version mapping from 13.113.2; added libnvjitlink-dev to common CUDA package list.
Build configuration
cpp/CMakeLists.txt
Added public compile definition CUSPARSE_ENABLE_EXPERIMENTAL_API to cuopt target.
cuSPARSE view header
cpp/src/pdlp/cusparse_view.hpp
Added CUDA_VER_13_2_UP macro; declared SpMVOp descriptor/plan wrapper classes and members; added create_spmv_op_plans(bool) declaration and new external-buffer members (CUDA 13.2+ gated).
cuSPARSE view implementation
cpp/src/pdlp/cusparse_view.cu
Implemented wrapper destructors/creators and create_spmv_op_plans(bool); initialized new external buffers in constructors and manage descriptor/plan lifecycle under CUDA 13.2+ guard.
PDHG SpMV paths
cpp/src/pdlp/pdhg.cu
Added explicit non-mixed double-precision branch to call cusparseSpMVOp(...) (with cusparseSetStream(...)) under CUDA_VER_13_2_UP; retained fallback to existing raft::sparse::detail::cusparsespmv(...) for other cases.
PDLP solver orchestration
cpp/src/pdlp/pdlp.cu
After scaling, conditionally calls create_spmv_op_plans(...) for double (non-batch, mixed-precision disabled) before updating mixed-precision matrices.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed The description clearly explains the change (replacing SpMV with SpMVOp in PDHG), specifies the scope (double precision only), and provides performance metrics (1.09x speedup), directly relating to the changeset.
Linked Issues check ✅ Passed The PR successfully implements SpMV acceleration for pdlp by replacing SpMV calls with SpMVOp calls [#9], adding necessary CUDA 13.2 support, wrapper types, and conditional logic in the PDHG solver.
Out of Scope Changes check ✅ Passed All code changes are directly related to SpMV-to-SpMVOp replacement for double precision PDHG acceleration. CUDA environment updates and the new wrapper/descriptor structures support the core objective without introducing unrelated modifications.
Title check ✅ Passed The title accurately describes the main change: replacing SpMV with SpMVOp in PDHG for double precision, which aligns with the PR's core objective and changes across multiple C++ source files and environment configurations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
cpp/src/pdlp/pdlp.cu (1)

2146-2148: Skip SpMVOp plan setup for batch solves.

compute_At_y() and compute_A_x() only enter the SpMVOp path under !batch_mode_; batch PDHG stays on SpMM. As written, double batch runs still pay the plan/JIT setup cost here even though those plans are never used.

♻️ Suggested guard
-  if (!pdhg_solver_.get_cusparse_view().mixed_precision_enabled_ && !std::is_same_v<f_t, float>)
-    pdhg_solver_.get_cusparse_view().create_spmv_op_plans(
-      settings_.hyper_params.use_reflected_primal_dual);
+  if constexpr (std::is_same_v<f_t, double>) {
+    if (!batch_mode_ && !pdhg_solver_.get_cusparse_view().mixed_precision_enabled_) {
+      pdhg_solver_.get_cusparse_view().create_spmv_op_plans(
+        settings_.hyper_params.use_reflected_primal_dual);
+    }
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/pdlp.cu` around lines 2146 - 2148, The SpMV plan/JIT setup is
being run even for batch solves; modify the conditional that calls
pdhg_solver_.get_cusparse_view().create_spmv_op_plans(...) to also check that
batch_mode_ is false (i.e., only create plans when not batch mode). Concretely,
update the current if that checks
pdhg_solver_.get_cusparse_view().mixed_precision_enabled_ and
!std::is_same_v<f_t, float> to additionally require !batch_mode_ before invoking
create_spmv_op_plans(...), leaving the existing mixed_precision and type checks
and the use of settings_.hyper_params.use_reflected_primal_dual unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/CMakeLists.txt`:
- Around line 281-284: The CMake target_compile_definitions for target "cuopt"
currently marks CUSPARSE_ENABLE_EXPERIMENTAL_API as PUBLIC which leaks the
experimental cuSPARSE opt-in to downstream consumers; change the
target_compile_definitions call that references
"CUSPARSE_ENABLE_EXPERIMENTAL_API" so that this definition is declared PRIVATE
(leave the CUOPT_LOG_ACTIVE_LEVEL definition as PUBLIC) to limit the
experimental API flag to internal implementation files (e.g., cpp/src/pdlp/ and
cpp/src/barrier/).

In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1159-1222: The new direct calls to cusparseSpMVOp_bufferSize /
cusparseSpMVOp_createDescr / cusparseSpMVOp_createPlan in create_spmv_op_plans
bypass the existing runtime-dynamic loading used for cusparseSpMV_preprocess and
will fail on older libcusparse; update create_spmv_op_plans (and use of
spmv_op_descr_A_t_, spmv_op_plan_A_t_, spmv_op_descr_A_, spmv_op_plan_A_) to
either (A) dynamically resolve those three symbols with dlsym at runtime and
fall back to the existing cusparsespmv path when resolution fails, or (B) gate
the calls behind a runtime check that enforces CUDA/cuSPARSE 13.1+ and otherwise
use the fallback implementation; implement one of these approaches so symbol
resolution does not depend on build-time linkage.

In `@cpp/src/pdlp/cusparse_view.hpp`:
- Around line 174-181: The SpMVOp handles spmv_op_descr_A_, spmv_op_descr_A_t_,
spmv_op_plan_A_, and spmv_op_plan_A_t_ are never destroyed; add explicit cleanup
by either wrapping them in RAII types (similar to
cusparse_dn_vec_descr_wrapper_t) or implementing a destructor/reset method that
checks for non-null handles and calls cusparseSpMVOp_destroyDescr() for
descriptors and cusparseSpMVOp_destroyPlan() for plans; ensure handles are
initialized to nullptr on construction and that create_spmv_op_plans() invokes
the reset/cleanup before creating new handles to avoid leaks and
double-creation.

---

Nitpick comments:
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 2146-2148: The SpMV plan/JIT setup is being run even for batch
solves; modify the conditional that calls
pdhg_solver_.get_cusparse_view().create_spmv_op_plans(...) to also check that
batch_mode_ is false (i.e., only create plans when not batch mode). Concretely,
update the current if that checks
pdhg_solver_.get_cusparse_view().mixed_precision_enabled_ and
!std::is_same_v<f_t, float> to additionally require !batch_mode_ before invoking
create_spmv_op_plans(...), leaving the existing mixed_precision and type checks
and the use of settings_.hyper_params.use_reflected_primal_dual unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b77457bd-c443-4b34-b8ea-78c3db138061

📥 Commits

Reviewing files that changed from the base of the PR and between 2000028 and 3ca7f55.

📒 Files selected for processing (10)
  • conda/environments/all_cuda-129_arch-aarch64.yaml
  • conda/environments/all_cuda-129_arch-x86_64.yaml
  • conda/environments/all_cuda-132_arch-aarch64.yaml
  • conda/environments/all_cuda-132_arch-x86_64.yaml
  • cpp/CMakeLists.txt
  • cpp/src/pdlp/cusparse_view.cu
  • cpp/src/pdlp/cusparse_view.hpp
  • cpp/src/pdlp/pdhg.cu
  • cpp/src/pdlp/pdlp.cu
  • dependencies.yaml

Comment on lines +281 to +284
target_compile_definitions(cuopt
PUBLIC "CUOPT_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_${LIBCUOPT_LOGGING_LEVEL}"
PUBLIC CUSPARSE_ENABLE_EXPERIMENTAL_API
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n '#include <pdlp/cusparse_view.hpp>|#include "pdlp/cusparse_view.hpp"|cusparseSpMVOp|CUSPARSE_ENABLE_EXPERIMENTAL_API' cpp || true

Repository: NVIDIA/cuopt

Length of output: 2290


🏁 Script executed:

cat -n cpp/CMakeLists.txt | head -300 | tail -50

Repository: NVIDIA/cuopt

Length of output: 1643


🏁 Script executed:

# Search for target definitions and barrier-related compilation
rg -n 'add_library|add_executable|target_sources.*barrier' cpp/CMakeLists.txt | head -30

Repository: NVIDIA/cuopt

Length of output: 335


🏁 Script executed:

# Check if barrier files are part of the cuopt target
rg -B5 -A5 'src/barrier' cpp/CMakeLists.txt | head -50

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

rg -n '#include.*cusparse_view' cpp/include/ || true

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Check what's exported from public headers
fd -e hpp cpp/include | head -20

Repository: NVIDIA/cuopt

Length of output: 405


🏁 Script executed:

fd . cpp/include --type f

Repository: NVIDIA/cuopt

Length of output: 2200


🏁 Script executed:

# Check if any public headers include pdlp/cusparse_view.hpp
rg -r '$0' cpp/include | rg 'cusparse_view|cusparseSpMVOp' || true

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Check if any public headers include files from cpp/src/pdlp
rg '#include.*pdlp' cpp/include/ || true

Repository: NVIDIA/cuopt

Length of output: 2265


🏁 Script executed:

# Also verify that the macro definition CUSPARSE_ENABLE_EXPERIMENTAL_API is only needed during compilation, not in public headers
rg -B2 -A2 'CUSPARSE_ENABLE_EXPERIMENTAL_API' cpp/include/ || echo "No mentions in public headers"

Repository: NVIDIA/cuopt

Length of output: 85


Scope CUSPARSE_ENABLE_EXPERIMENTAL_API as PRIVATE instead of PUBLIC.

The experimental cuSPARSE API is used only in internal implementation files under cpp/src/pdlp/ (and cpp/src/barrier/ which includes them). Public headers do not directly or transitively include cusparse_view.hpp, so exporting this macro as PUBLIC leaks an experimental CUDA opt-in to downstream consumers and install exports unnecessarily. Change it to PRIVATE:

 target_compile_definitions(cuopt
   PUBLIC "CUOPT_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_${LIBCUOPT_LOGGING_LEVEL}"
-  PUBLIC CUSPARSE_ENABLE_EXPERIMENTAL_API
+  PRIVATE CUSPARSE_ENABLE_EXPERIMENTAL_API
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
target_compile_definitions(cuopt
PUBLIC "CUOPT_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_${LIBCUOPT_LOGGING_LEVEL}"
PUBLIC CUSPARSE_ENABLE_EXPERIMENTAL_API
)
target_compile_definitions(cuopt
PUBLIC "CUOPT_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_${LIBCUOPT_LOGGING_LEVEL}"
PRIVATE CUSPARSE_ENABLE_EXPERIMENTAL_API
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/CMakeLists.txt` around lines 281 - 284, The CMake
target_compile_definitions for target "cuopt" currently marks
CUSPARSE_ENABLE_EXPERIMENTAL_API as PUBLIC which leaks the experimental cuSPARSE
opt-in to downstream consumers; change the target_compile_definitions call that
references "CUSPARSE_ENABLE_EXPERIMENTAL_API" so that this definition is
declared PRIVATE (leave the CUOPT_LOG_ACTIVE_LEVEL definition as PUBLIC) to
limit the experimental API flag to internal implementation files (e.g.,
cpp/src/pdlp/ and cpp/src/barrier/).

Comment on lines +1159 to +1222
// Creates SpMVOp plans. Must be called after scale_problem() so plans use the scaled matrix.
template <typename i_t, typename f_t>
void cusparse_view_t<i_t, f_t>::create_spmv_op_plans(bool is_reflected)
{
// Prepare buffers for At_y SpMVOp
size_t buffer_size_transpose = 0;
RAFT_CUSPARSE_TRY(cusparseSpMVOp_bufferSize(handle_ptr_->get_cusparse_handle(),
CUSPARSE_OPERATION_NON_TRANSPOSE,
A_T,
dual_solution,
current_AtY,
current_AtY,
CUDA_R_64F,
&buffer_size_transpose));
buffer_transpose_spmvop.resize(buffer_size_transpose, handle_ptr_->get_stream());

RAFT_CUSPARSE_TRY(cusparseSpMVOp_createDescr(handle_ptr_->get_cusparse_handle(),
&spmv_op_descr_A_t_,
CUSPARSE_OPERATION_NON_TRANSPOSE,
A_T,
dual_solution,
current_AtY,
current_AtY,
CUDA_R_64F,
buffer_transpose_spmvop.data()));

char* lto_buffer = NULL;
size_t lto_buffer_size = 0;
RAFT_CUSPARSE_TRY(cusparseSpMVOp_createPlan(handle_ptr_->get_cusparse_handle(),
spmv_op_descr_A_t_,
&spmv_op_plan_A_t_,
lto_buffer,
lto_buffer_size));

// Only prepare buffers for A_x if we are using reflected_halpern
if (is_reflected) {
size_t buffer_size_non_transpose = 0;
RAFT_CUSPARSE_TRY(cusparseSpMVOp_bufferSize(handle_ptr_->get_cusparse_handle(),
CUSPARSE_OPERATION_NON_TRANSPOSE,
A,
reflected_primal_solution,
dual_gradient,
dual_gradient,
CUDA_R_64F,
&buffer_size_non_transpose));
buffer_non_transpose_spmvop.resize(buffer_size_non_transpose, handle_ptr_->get_stream());

RAFT_CUSPARSE_TRY(cusparseSpMVOp_createDescr(handle_ptr_->get_cusparse_handle(),
&spmv_op_descr_A_,
CUSPARSE_OPERATION_NON_TRANSPOSE,
A,
reflected_primal_solution,
dual_gradient,
dual_gradient,
CUDA_R_64F,
buffer_non_transpose_spmvop.data()));

RAFT_CUSPARSE_TRY(cusparseSpMVOp_createPlan(handle_ptr_->get_cusparse_handle(),
spmv_op_descr_A_,
&spmv_op_plan_A_,
lto_buffer,
lto_buffer_size));
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

According to the official CUDA cuSPARSE documentation, what toolkit/runtime version first provides the experimental cusparseSpMVOp_*API, and is direct linkage against those symbols expected to work whenlibcusparse.so at runtime is older than the build toolkit?

💡 Result:

  • First toolkit/runtime version that provides the experimental cusparseSpMVOp_* API: CUDA Toolkit 13.1 (cuSPARSE “Release 13.1”) introduced the experimental SpMVOp API. [1] (The API is documented as cusparseSpMVOp() [EXPERIMENTAL] in current cuSPARSE docs. [2])

  • Will direct linkage work if libcusparse.so at runtime is older than the build toolkit? No—this is not supported/expected to work. The official cuSPARSE docs explicitly state that cuSPARSE is not guaranteed to be backward compatible at the binary level, and that using different versions of the cusparse.h header and the shared library is not supported. [3] In practice, if the runtime libcusparse.so is older than the toolkit you built against, the cusparseSpMVOp_* symbols may be missing and the program can fail to load/resolve symbols.

Sources:
[1] CUDA Toolkit 13.1 release notes (cuSPARSE: Release 13.1) (docs.nvidia.cn)
[2] cuSPARSE documentation: cusparseSpMVOp() [EXPERIMENTAL] (docs.nvidia.com)
[3] cuSPARSE documentation: “Compatibility and Versioning” (binary compatibility / mixed header+library versions not supported) (docs.nvidia.com)

Citations:


Guard SpMVOp API calls behind CUDA 13.1+ runtime compatibility checks.

This file already dynamically loads cusparseSpMV_preprocess via dlsym because build-time and runtime libcusparse.so versions can differ. The new direct calls to cusparseSpMVOp_bufferSize, cusparseSpMVOp_createDescr, and cusparseSpMVOp_createPlan bypass that safeguard. The cuSPARSE SpMVOp API was introduced in CUDA Toolkit 13.1 and direct symbol linkage is not supported when runtime libcusparse.so is older than the build toolkit—the program will fail to resolve symbols. Either enforce CUDA 13.1+ as the minimum supported runtime in build metadata and documentation, or dynamically load these symbols with a fallback to the existing cusparsespmv path to maintain compatibility with older CUDA runtimes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cusparse_view.cu` around lines 1159 - 1222, The new direct calls
to cusparseSpMVOp_bufferSize / cusparseSpMVOp_createDescr /
cusparseSpMVOp_createPlan in create_spmv_op_plans bypass the existing
runtime-dynamic loading used for cusparseSpMV_preprocess and will fail on older
libcusparse; update create_spmv_op_plans (and use of spmv_op_descr_A_t_,
spmv_op_plan_A_t_, spmv_op_descr_A_, spmv_op_plan_A_) to either (A) dynamically
resolve those three symbols with dlsym at runtime and fall back to the existing
cusparsespmv path when resolution fails, or (B) gate the calls behind a runtime
check that enforces CUDA/cuSPARSE 13.1+ and otherwise use the fallback
implementation; implement one of these approaches so symbol resolution does not
depend on build-time linkage.

@anandhkb anandhkb added this to the 26.04 milestone Mar 23, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
cpp/src/pdlp/cusparse_view.cu (1)

1163-1221: ⚠️ Potential issue | 🟠 Major

Either enforce a 13.2+ runtime or keep the existing fallback strategy.

#if CUDA_VER_13_2_UP only protects compilation. The same file still uses dynamic_load_runtime for cusparseSpMV_preprocess, so this new path does not inherit the older-runtime behavior that cusparse_view.cu already supports elsewhere. Please either resolve the cusparseSpMVOp_* symbols dynamically too, or add a runtime-version gate that falls back to the existing SpMV path.

#!/bin/bash
set -euo pipefail

echo "Existing runtime-loaded cuSPARSE path:"
rg -n -C2 'dynamic_load_runtime::function|cusparseSpMV_preprocess' cpp/src/pdlp/cusparse_view.cu

echo
echo "New SpMVOp calls that currently bind directly:"
rg -n -C2 'cusparseSpMVOp_(bufferSize|createDescr|createPlan)' cpp/src/pdlp/cusparse_view.cu
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cusparse_view.cu` around lines 1163 - 1221, The new CUDA-13.2+
cusparseSpMVOp_* calls (cusparseSpMVOp_bufferSize, cusparseSpMVOp_createDescr,
cusparseSpMVOp_createPlan) are only compile-gated with `#if` CUDA_VER_13_2_UP but
still link directly at runtime, breaking environments that rely on the file's
existing dynamic loading path (dynamic_load_runtime and
cusparseSpMV_preprocess). Fix by either (A) wiring these three symbols into the
same dynamic loader used elsewhere (add runtime lookups and use those function
pointers instead of direct symbols) or (B) add a runtime-version check that
falls back to the existing cusparseSpMV_preprocess-based code path when the
runtime does not expose the SpMVOp API; update both the A_t path
(spmv_op_descr_A_t_, spmv_op_plan_A_t_) and the reflected A path
(spmv_op_descr_A_, spmv_op_plan_A_) and ensure is_reflected handling still uses
the fallback when needed.
🧹 Nitpick comments (1)
cpp/src/pdlp/cusparse_view.cu (1)

1160-1224: Encode the double-only contract in the helper itself.

This method hardcodes CUDA_R_64F, but cusparse_view_t<int, float> is instantiated in this file at line 1230. The call site in pdlp.cu has a compile-time guard (if constexpr (std::is_same_v<f_t, double>)) that keeps this off the float path today. Adding an if constexpr here would make that contract explicit and local instead of relying on caller discipline, consistent with the type-dispatch pattern already used elsewhere in this file (e.g., lines 225–228, 262–265).

♻️ Suggested guard
 template <typename i_t, typename f_t>
 void cusparse_view_t<i_t, f_t>::create_spmv_op_plans(bool is_reflected)
 {
 `#if` CUDA_VER_13_2_UP
+  if constexpr (!std::is_same_v<f_t, double>) { return; }
+
   // Prepare buffers for At_y SpMVOp
   size_t buffer_size_transpose = 0;
   RAFT_CUSPARSE_TRY(cusparseSpMVOp_bufferSize(handle_ptr_->get_cusparse_handle(),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cusparse_view.cu` around lines 1160 - 1224, The
create_spmv_op_plans method hardcodes CUDA_R_64F despite being templated on f_t;
add an if constexpr(std::is_same_v<f_t, double>) guard around the
CUDA_R_64F-specific cusparseSpMVOp_bufferSize/createDescr/createPlan calls in
cusparse_view_t<i_t,f_t>::create_spmv_op_plans so the double-only contract is
enforced inside the helper (for both the A_T/At_y and the is_reflected A_x
branches); for non-double f_t, skip these double-only calls (or
assert/throw/return) to match the type-dispatch pattern used elsewhere and avoid
relying on callers to gate instantiations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1163-1221: The new CUDA-13.2+ cusparseSpMVOp_* calls
(cusparseSpMVOp_bufferSize, cusparseSpMVOp_createDescr,
cusparseSpMVOp_createPlan) are only compile-gated with `#if` CUDA_VER_13_2_UP but
still link directly at runtime, breaking environments that rely on the file's
existing dynamic loading path (dynamic_load_runtime and
cusparseSpMV_preprocess). Fix by either (A) wiring these three symbols into the
same dynamic loader used elsewhere (add runtime lookups and use those function
pointers instead of direct symbols) or (B) add a runtime-version check that
falls back to the existing cusparseSpMV_preprocess-based code path when the
runtime does not expose the SpMVOp API; update both the A_t path
(spmv_op_descr_A_t_, spmv_op_plan_A_t_) and the reflected A path
(spmv_op_descr_A_, spmv_op_plan_A_) and ensure is_reflected handling still uses
the fallback when needed.

---

Nitpick comments:
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1160-1224: The create_spmv_op_plans method hardcodes CUDA_R_64F
despite being templated on f_t; add an if constexpr(std::is_same_v<f_t, double>)
guard around the CUDA_R_64F-specific
cusparseSpMVOp_bufferSize/createDescr/createPlan calls in
cusparse_view_t<i_t,f_t>::create_spmv_op_plans so the double-only contract is
enforced inside the helper (for both the A_T/At_y and the is_reflected A_x
branches); for non-double f_t, skip these double-only calls (or
assert/throw/return) to match the type-dispatch pattern used elsewhere and avoid
relying on callers to gate instantiations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fb417ff9-5d4f-44fc-b6a4-d5f552182965

📥 Commits

Reviewing files that changed from the base of the PR and between 3ca7f55 and 2db9c91.

📒 Files selected for processing (4)
  • cpp/src/pdlp/cusparse_view.cu
  • cpp/src/pdlp/cusparse_view.hpp
  • cpp/src/pdlp/pdhg.cu
  • cpp/src/pdlp/pdlp.cu
🚧 Files skipped from review as they are similar to previous changes (3)
  • cpp/src/pdlp/pdlp.cu
  • cpp/src/pdlp/pdhg.cu
  • cpp/src/pdlp/cusparse_view.hpp

@Bubullzz
Copy link
Copy Markdown
Author

will add code rabbit suggestions

@Bubullzz Bubullzz closed this Mar 23, 2026
@Bubullzz Bubullzz reopened this Mar 24, 2026
@Kh4ster
Copy link
Copy Markdown
Contributor

Kh4ster commented Mar 24, 2026

/ok to test 293a33f

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
cpp/src/pdlp/cusparse_view.cu (1)

1039-1054: ⚠️ Potential issue | 🟠 Major

Keep the new SpMVOp symbols behind the same runtime-compatibility mechanism as the existing preprocess hook.

This TU already uses dlsym for version-sensitive cuSPARSE entry points, but the new cusparseSpMVOp_* create/destroy calls are linked directly. If mixed build/runtime CUDA versions are still a supported deployment mode, this path now bypasses the existing fallback story. Either make the required cuSPARSE runtime explicit or dynamically resolve/fallback here too.

According to the official cuSPARSE documentation, is mixing a newer `cusparse.h` with an older runtime `libcusparse.so` supported, and in which CUDA/cuSPARSE release were the `cusparseSpMVOp_*` APIs used here introduced (including `cusparseSpMVOp_bufferSize`)?

Also applies to: 1176-1241

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cusparse_view.cu` around lines 1039 - 1054, The destructor calls
for the new cuSPARSE SpMVOp APIs (cusparseSpMVOp_destroyPlan,
cusparseSpMVOp_destroyDescr and their create counterparts used elsewhere) are
linked directly and bypass the existing dlsym-based
runtime-compatibility/fallback mechanism; change these to use the same dynamic
resolver (the TU's dlsym-based lookup used for preprocess hooks) or
require/explicitly check for a compatible cuSPARSE runtime before calling, and
mirror that change for the matching create/use sites (e.g., the
cusparseSpMVOp_createPlan/cusparseSpMVOp_createDescr and
cusparseSpMVOp_bufferSize usages around lines 1176-1241) so the code falls back
cleanly when the runtime lacks these symbols.
🧹 Nitpick comments (1)
cpp/src/pdlp/cusparse_view.cu (1)

307-308: Align the version guard with the constructor initializer lists.

These four initializers assume the SpMVOp buffers always exist, but the members themselves are only declared under CUDA_VER_13_2_UP in cpp/src/pdlp/cusparse_view.hpp. If 13.2 is not the hard minimum, this split guard will break older-toolkit builds; if it is, the conditional members just hide the real requirement. Please make that contract consistent in both places.

#!/bin/bash
set -euo pipefail

echo "Conditional declarations vs constructor initializers:"
rg -n -C2 'CUDA_VER_13_2_UP|buffer_(non_transpose|transpose)_spmvop' \
  cpp/src/pdlp/cusparse_view.hpp cpp/src/pdlp/cusparse_view.cu

echo
echo "CUDA/cuSPARSE version expectations declared in the repo:"
rg -n -C2 '13\.2|12\.4|CUSPARSE_ENABLE_EXPERIMENTAL_API|CUDART_VERSION|find_package\(CUDAToolkit' \
  cpp/CMakeLists.txt cpp/src

Also applies to: 721-722, 932-933, 1066-1067

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/cusparse_view.cu` around lines 307 - 308, The constructor
initializer list references buffer_non_transpose_spmvop and
buffer_transpose_spmvop but those members are only declared under the
CUDA_VER_13_2_UP guard in the header, causing mismatch; update the guards to be
consistent by moving the CUDA_VER_13_2_UP macro to cover the same places in both
declaration and constructor (either declare the members unconditionally or wrap
the constructor initializers and any uses of buffer_non_transpose_spmvop/
buffer_transpose_spmvop in the same CUDA_VER_13_2_UP guard), and apply the same
fix to the other occurrences you noted (the similar initializer pairs at the
other constructor sites).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1039-1054: The destructor calls for the new cuSPARSE SpMVOp APIs
(cusparseSpMVOp_destroyPlan, cusparseSpMVOp_destroyDescr and their create
counterparts used elsewhere) are linked directly and bypass the existing
dlsym-based runtime-compatibility/fallback mechanism; change these to use the
same dynamic resolver (the TU's dlsym-based lookup used for preprocess hooks) or
require/explicitly check for a compatible cuSPARSE runtime before calling, and
mirror that change for the matching create/use sites (e.g., the
cusparseSpMVOp_createPlan/cusparseSpMVOp_createDescr and
cusparseSpMVOp_bufferSize usages around lines 1176-1241) so the code falls back
cleanly when the runtime lacks these symbols.

---

Nitpick comments:
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 307-308: The constructor initializer list references
buffer_non_transpose_spmvop and buffer_transpose_spmvop but those members are
only declared under the CUDA_VER_13_2_UP guard in the header, causing mismatch;
update the guards to be consistent by moving the CUDA_VER_13_2_UP macro to cover
the same places in both declaration and constructor (either declare the members
unconditionally or wrap the constructor initializers and any uses of
buffer_non_transpose_spmvop/ buffer_transpose_spmvop in the same
CUDA_VER_13_2_UP guard), and apply the same fix to the other occurrences you
noted (the similar initializer pairs at the other constructor sites).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 27d51901-b3a6-4a92-bf2b-56b0d217fabe

📥 Commits

Reviewing files that changed from the base of the PR and between 2db9c91 and 293a33f.

📒 Files selected for processing (2)
  • cpp/src/pdlp/cusparse_view.cu
  • cpp/src/pdlp/cusparse_view.hpp

const rmm::device_uvector<i_t>&, // Empty just to init the const&
const std::vector<pdlp_climber_strategy_t>&); // Empty just to init the const&

~cusparse_view_t();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using a destructor for the whole class, create a wrapper around new cusparseSpMVOpDescr_t and plan with a destructor just like with cusparse_dn_vec_descr_wrapper_t to avoid forgetting destroying futur plan in the cusparse destructor

RAFT_CUSPARSE_TRY_NO_THROW(cusparseSpMVOp_destroyDescr(spmv_op_descr_A_));
}
#endif
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As already stated, create a custom wrapper instead

spmv_op_descr_A_,
&spmv_op_plan_A_,
lto_buffer,
lto_buffer_size));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure it's safe to reuse lto_buffer and lto_buffer_size in the two calls

@rg20
Copy link
Copy Markdown
Contributor

rg20 commented Mar 24, 2026

@Bubullzz Can you summarize the performance results as well in the PR description?

@Bubullzz Bubullzz closed this Mar 24, 2026
@Bubullzz
Copy link
Copy Markdown
Author

Bubullzz commented Mar 24, 2026

I think it might be better to wait for after the release to merge this PR, we don't have a cuda 13.2 CI which is needed for SpMVOp

@Bubullzz Bubullzz reopened this Mar 24, 2026
@chris-maes
Copy link
Copy Markdown
Contributor

@Iroy30 can you help with CI on this PR?

@Bubullzz
Copy link
Copy Markdown
Author

/ok test a086c81

@chris-maes chris-maes added the P2 label Mar 25, 2026
@chris-maes chris-maes changed the title Replaced SpMV with SpMVOp in PDHG for double precision# Replaced SpMV with SpMVOp in PDHG for double precision Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change P2 pdlp

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants