Replaced SpMV with SpMVOp in PDHG for double precision#989
Replaced SpMV with SpMVOp in PDHG for double precision#989Bubullzz wants to merge 12 commits intoNVIDIA:release/26.04from
Conversation
…bnvjitlink-dev manually
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughBumps CUDA matrix to 13.2 and adds Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
cpp/src/pdlp/pdlp.cu (1)
2146-2148: Skip SpMVOp plan setup for batch solves.
compute_At_y()andcompute_A_x()only enter the SpMVOp path under!batch_mode_; batch PDHG stays on SpMM. As written, double batch runs still pay the plan/JIT setup cost here even though those plans are never used.♻️ Suggested guard
- if (!pdhg_solver_.get_cusparse_view().mixed_precision_enabled_ && !std::is_same_v<f_t, float>) - pdhg_solver_.get_cusparse_view().create_spmv_op_plans( - settings_.hyper_params.use_reflected_primal_dual); + if constexpr (std::is_same_v<f_t, double>) { + if (!batch_mode_ && !pdhg_solver_.get_cusparse_view().mixed_precision_enabled_) { + pdhg_solver_.get_cusparse_view().create_spmv_op_plans( + settings_.hyper_params.use_reflected_primal_dual); + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/pdlp.cu` around lines 2146 - 2148, The SpMV plan/JIT setup is being run even for batch solves; modify the conditional that calls pdhg_solver_.get_cusparse_view().create_spmv_op_plans(...) to also check that batch_mode_ is false (i.e., only create plans when not batch mode). Concretely, update the current if that checks pdhg_solver_.get_cusparse_view().mixed_precision_enabled_ and !std::is_same_v<f_t, float> to additionally require !batch_mode_ before invoking create_spmv_op_plans(...), leaving the existing mixed_precision and type checks and the use of settings_.hyper_params.use_reflected_primal_dual unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/CMakeLists.txt`:
- Around line 281-284: The CMake target_compile_definitions for target "cuopt"
currently marks CUSPARSE_ENABLE_EXPERIMENTAL_API as PUBLIC which leaks the
experimental cuSPARSE opt-in to downstream consumers; change the
target_compile_definitions call that references
"CUSPARSE_ENABLE_EXPERIMENTAL_API" so that this definition is declared PRIVATE
(leave the CUOPT_LOG_ACTIVE_LEVEL definition as PUBLIC) to limit the
experimental API flag to internal implementation files (e.g., cpp/src/pdlp/ and
cpp/src/barrier/).
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1159-1222: The new direct calls to cusparseSpMVOp_bufferSize /
cusparseSpMVOp_createDescr / cusparseSpMVOp_createPlan in create_spmv_op_plans
bypass the existing runtime-dynamic loading used for cusparseSpMV_preprocess and
will fail on older libcusparse; update create_spmv_op_plans (and use of
spmv_op_descr_A_t_, spmv_op_plan_A_t_, spmv_op_descr_A_, spmv_op_plan_A_) to
either (A) dynamically resolve those three symbols with dlsym at runtime and
fall back to the existing cusparsespmv path when resolution fails, or (B) gate
the calls behind a runtime check that enforces CUDA/cuSPARSE 13.1+ and otherwise
use the fallback implementation; implement one of these approaches so symbol
resolution does not depend on build-time linkage.
In `@cpp/src/pdlp/cusparse_view.hpp`:
- Around line 174-181: The SpMVOp handles spmv_op_descr_A_, spmv_op_descr_A_t_,
spmv_op_plan_A_, and spmv_op_plan_A_t_ are never destroyed; add explicit cleanup
by either wrapping them in RAII types (similar to
cusparse_dn_vec_descr_wrapper_t) or implementing a destructor/reset method that
checks for non-null handles and calls cusparseSpMVOp_destroyDescr() for
descriptors and cusparseSpMVOp_destroyPlan() for plans; ensure handles are
initialized to nullptr on construction and that create_spmv_op_plans() invokes
the reset/cleanup before creating new handles to avoid leaks and
double-creation.
---
Nitpick comments:
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 2146-2148: The SpMV plan/JIT setup is being run even for batch
solves; modify the conditional that calls
pdhg_solver_.get_cusparse_view().create_spmv_op_plans(...) to also check that
batch_mode_ is false (i.e., only create plans when not batch mode). Concretely,
update the current if that checks
pdhg_solver_.get_cusparse_view().mixed_precision_enabled_ and
!std::is_same_v<f_t, float> to additionally require !batch_mode_ before invoking
create_spmv_op_plans(...), leaving the existing mixed_precision and type checks
and the use of settings_.hyper_params.use_reflected_primal_dual unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b77457bd-c443-4b34-b8ea-78c3db138061
📒 Files selected for processing (10)
conda/environments/all_cuda-129_arch-aarch64.yamlconda/environments/all_cuda-129_arch-x86_64.yamlconda/environments/all_cuda-132_arch-aarch64.yamlconda/environments/all_cuda-132_arch-x86_64.yamlcpp/CMakeLists.txtcpp/src/pdlp/cusparse_view.cucpp/src/pdlp/cusparse_view.hppcpp/src/pdlp/pdhg.cucpp/src/pdlp/pdlp.cudependencies.yaml
| target_compile_definitions(cuopt | ||
| PUBLIC "CUOPT_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_${LIBCUOPT_LOGGING_LEVEL}" | ||
| PUBLIC CUSPARSE_ENABLE_EXPERIMENTAL_API | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n '#include <pdlp/cusparse_view.hpp>|#include "pdlp/cusparse_view.hpp"|cusparseSpMVOp|CUSPARSE_ENABLE_EXPERIMENTAL_API' cpp || trueRepository: NVIDIA/cuopt
Length of output: 2290
🏁 Script executed:
cat -n cpp/CMakeLists.txt | head -300 | tail -50Repository: NVIDIA/cuopt
Length of output: 1643
🏁 Script executed:
# Search for target definitions and barrier-related compilation
rg -n 'add_library|add_executable|target_sources.*barrier' cpp/CMakeLists.txt | head -30Repository: NVIDIA/cuopt
Length of output: 335
🏁 Script executed:
# Check if barrier files are part of the cuopt target
rg -B5 -A5 'src/barrier' cpp/CMakeLists.txt | head -50Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
rg -n '#include.*cusparse_view' cpp/include/ || trueRepository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Check what's exported from public headers
fd -e hpp cpp/include | head -20Repository: NVIDIA/cuopt
Length of output: 405
🏁 Script executed:
fd . cpp/include --type fRepository: NVIDIA/cuopt
Length of output: 2200
🏁 Script executed:
# Check if any public headers include pdlp/cusparse_view.hpp
rg -r '$0' cpp/include | rg 'cusparse_view|cusparseSpMVOp' || trueRepository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
# Check if any public headers include files from cpp/src/pdlp
rg '#include.*pdlp' cpp/include/ || trueRepository: NVIDIA/cuopt
Length of output: 2265
🏁 Script executed:
# Also verify that the macro definition CUSPARSE_ENABLE_EXPERIMENTAL_API is only needed during compilation, not in public headers
rg -B2 -A2 'CUSPARSE_ENABLE_EXPERIMENTAL_API' cpp/include/ || echo "No mentions in public headers"Repository: NVIDIA/cuopt
Length of output: 85
Scope CUSPARSE_ENABLE_EXPERIMENTAL_API as PRIVATE instead of PUBLIC.
The experimental cuSPARSE API is used only in internal implementation files under cpp/src/pdlp/ (and cpp/src/barrier/ which includes them). Public headers do not directly or transitively include cusparse_view.hpp, so exporting this macro as PUBLIC leaks an experimental CUDA opt-in to downstream consumers and install exports unnecessarily. Change it to PRIVATE:
target_compile_definitions(cuopt
PUBLIC "CUOPT_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_${LIBCUOPT_LOGGING_LEVEL}"
- PUBLIC CUSPARSE_ENABLE_EXPERIMENTAL_API
+ PRIVATE CUSPARSE_ENABLE_EXPERIMENTAL_API
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| target_compile_definitions(cuopt | |
| PUBLIC "CUOPT_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_${LIBCUOPT_LOGGING_LEVEL}" | |
| PUBLIC CUSPARSE_ENABLE_EXPERIMENTAL_API | |
| ) | |
| target_compile_definitions(cuopt | |
| PUBLIC "CUOPT_LOG_ACTIVE_LEVEL=RAPIDS_LOGGER_LOG_LEVEL_${LIBCUOPT_LOGGING_LEVEL}" | |
| PRIVATE CUSPARSE_ENABLE_EXPERIMENTAL_API | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/CMakeLists.txt` around lines 281 - 284, The CMake
target_compile_definitions for target "cuopt" currently marks
CUSPARSE_ENABLE_EXPERIMENTAL_API as PUBLIC which leaks the experimental cuSPARSE
opt-in to downstream consumers; change the target_compile_definitions call that
references "CUSPARSE_ENABLE_EXPERIMENTAL_API" so that this definition is
declared PRIVATE (leave the CUOPT_LOG_ACTIVE_LEVEL definition as PUBLIC) to
limit the experimental API flag to internal implementation files (e.g.,
cpp/src/pdlp/ and cpp/src/barrier/).
| // Creates SpMVOp plans. Must be called after scale_problem() so plans use the scaled matrix. | ||
| template <typename i_t, typename f_t> | ||
| void cusparse_view_t<i_t, f_t>::create_spmv_op_plans(bool is_reflected) | ||
| { | ||
| // Prepare buffers for At_y SpMVOp | ||
| size_t buffer_size_transpose = 0; | ||
| RAFT_CUSPARSE_TRY(cusparseSpMVOp_bufferSize(handle_ptr_->get_cusparse_handle(), | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| A_T, | ||
| dual_solution, | ||
| current_AtY, | ||
| current_AtY, | ||
| CUDA_R_64F, | ||
| &buffer_size_transpose)); | ||
| buffer_transpose_spmvop.resize(buffer_size_transpose, handle_ptr_->get_stream()); | ||
|
|
||
| RAFT_CUSPARSE_TRY(cusparseSpMVOp_createDescr(handle_ptr_->get_cusparse_handle(), | ||
| &spmv_op_descr_A_t_, | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| A_T, | ||
| dual_solution, | ||
| current_AtY, | ||
| current_AtY, | ||
| CUDA_R_64F, | ||
| buffer_transpose_spmvop.data())); | ||
|
|
||
| char* lto_buffer = NULL; | ||
| size_t lto_buffer_size = 0; | ||
| RAFT_CUSPARSE_TRY(cusparseSpMVOp_createPlan(handle_ptr_->get_cusparse_handle(), | ||
| spmv_op_descr_A_t_, | ||
| &spmv_op_plan_A_t_, | ||
| lto_buffer, | ||
| lto_buffer_size)); | ||
|
|
||
| // Only prepare buffers for A_x if we are using reflected_halpern | ||
| if (is_reflected) { | ||
| size_t buffer_size_non_transpose = 0; | ||
| RAFT_CUSPARSE_TRY(cusparseSpMVOp_bufferSize(handle_ptr_->get_cusparse_handle(), | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| A, | ||
| reflected_primal_solution, | ||
| dual_gradient, | ||
| dual_gradient, | ||
| CUDA_R_64F, | ||
| &buffer_size_non_transpose)); | ||
| buffer_non_transpose_spmvop.resize(buffer_size_non_transpose, handle_ptr_->get_stream()); | ||
|
|
||
| RAFT_CUSPARSE_TRY(cusparseSpMVOp_createDescr(handle_ptr_->get_cusparse_handle(), | ||
| &spmv_op_descr_A_, | ||
| CUSPARSE_OPERATION_NON_TRANSPOSE, | ||
| A, | ||
| reflected_primal_solution, | ||
| dual_gradient, | ||
| dual_gradient, | ||
| CUDA_R_64F, | ||
| buffer_non_transpose_spmvop.data())); | ||
|
|
||
| RAFT_CUSPARSE_TRY(cusparseSpMVOp_createPlan(handle_ptr_->get_cusparse_handle(), | ||
| spmv_op_descr_A_, | ||
| &spmv_op_plan_A_, | ||
| lto_buffer, | ||
| lto_buffer_size)); | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
According to the official CUDA cuSPARSE documentation, what toolkit/runtime version first provides the experimental cusparseSpMVOp_*API, and is direct linkage against those symbols expected to work whenlibcusparse.so at runtime is older than the build toolkit?
💡 Result:
-
First toolkit/runtime version that provides the experimental
cusparseSpMVOp_*API: CUDA Toolkit 13.1 (cuSPARSE “Release 13.1”) introduced the experimental SpMVOp API. [1] (The API is documented ascusparseSpMVOp() [EXPERIMENTAL]in current cuSPARSE docs. [2]) -
Will direct linkage work if
libcusparse.soat runtime is older than the build toolkit? No—this is not supported/expected to work. The official cuSPARSE docs explicitly state that cuSPARSE is not guaranteed to be backward compatible at the binary level, and that using different versions of thecusparse.hheader and the shared library is not supported. [3] In practice, if the runtimelibcusparse.sois older than the toolkit you built against, thecusparseSpMVOp_*symbols may be missing and the program can fail to load/resolve symbols.
Sources:
[1] CUDA Toolkit 13.1 release notes (cuSPARSE: Release 13.1) (docs.nvidia.cn)
[2] cuSPARSE documentation: cusparseSpMVOp() [EXPERIMENTAL] (docs.nvidia.com)
[3] cuSPARSE documentation: “Compatibility and Versioning” (binary compatibility / mixed header+library versions not supported) (docs.nvidia.com)
Citations:
- 1: https://docs.nvidia.cn/cuda/cuda-toolkit-release-notes/index.html
- 2: https://docs.nvidia.com/cuda/cusparse/
- 3: https://docs.nvidia.com/cuda/cusparse/
Guard SpMVOp API calls behind CUDA 13.1+ runtime compatibility checks.
This file already dynamically loads cusparseSpMV_preprocess via dlsym because build-time and runtime libcusparse.so versions can differ. The new direct calls to cusparseSpMVOp_bufferSize, cusparseSpMVOp_createDescr, and cusparseSpMVOp_createPlan bypass that safeguard. The cuSPARSE SpMVOp API was introduced in CUDA Toolkit 13.1 and direct symbol linkage is not supported when runtime libcusparse.so is older than the build toolkit—the program will fail to resolve symbols. Either enforce CUDA 13.1+ as the minimum supported runtime in build metadata and documentation, or dynamically load these symbols with a fallback to the existing cusparsespmv path to maintain compatibility with older CUDA runtimes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/pdlp/cusparse_view.cu` around lines 1159 - 1222, The new direct calls
to cusparseSpMVOp_bufferSize / cusparseSpMVOp_createDescr /
cusparseSpMVOp_createPlan in create_spmv_op_plans bypass the existing
runtime-dynamic loading used for cusparseSpMV_preprocess and will fail on older
libcusparse; update create_spmv_op_plans (and use of spmv_op_descr_A_t_,
spmv_op_plan_A_t_, spmv_op_descr_A_, spmv_op_plan_A_) to either (A) dynamically
resolve those three symbols with dlsym at runtime and fall back to the existing
cusparsespmv path when resolution fails, or (B) gate the calls behind a runtime
check that enforces CUDA/cuSPARSE 13.1+ and otherwise use the fallback
implementation; implement one of these approaches so symbol resolution does not
depend on build-time linkage.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
cpp/src/pdlp/cusparse_view.cu (1)
1163-1221:⚠️ Potential issue | 🟠 MajorEither enforce a 13.2+ runtime or keep the existing fallback strategy.
#if CUDA_VER_13_2_UPonly protects compilation. The same file still usesdynamic_load_runtimeforcusparseSpMV_preprocess, so this new path does not inherit the older-runtime behavior thatcusparse_view.cualready supports elsewhere. Please either resolve thecusparseSpMVOp_*symbols dynamically too, or add a runtime-version gate that falls back to the existing SpMV path.#!/bin/bash set -euo pipefail echo "Existing runtime-loaded cuSPARSE path:" rg -n -C2 'dynamic_load_runtime::function|cusparseSpMV_preprocess' cpp/src/pdlp/cusparse_view.cu echo echo "New SpMVOp calls that currently bind directly:" rg -n -C2 'cusparseSpMVOp_(bufferSize|createDescr|createPlan)' cpp/src/pdlp/cusparse_view.cu🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/cusparse_view.cu` around lines 1163 - 1221, The new CUDA-13.2+ cusparseSpMVOp_* calls (cusparseSpMVOp_bufferSize, cusparseSpMVOp_createDescr, cusparseSpMVOp_createPlan) are only compile-gated with `#if` CUDA_VER_13_2_UP but still link directly at runtime, breaking environments that rely on the file's existing dynamic loading path (dynamic_load_runtime and cusparseSpMV_preprocess). Fix by either (A) wiring these three symbols into the same dynamic loader used elsewhere (add runtime lookups and use those function pointers instead of direct symbols) or (B) add a runtime-version check that falls back to the existing cusparseSpMV_preprocess-based code path when the runtime does not expose the SpMVOp API; update both the A_t path (spmv_op_descr_A_t_, spmv_op_plan_A_t_) and the reflected A path (spmv_op_descr_A_, spmv_op_plan_A_) and ensure is_reflected handling still uses the fallback when needed.
🧹 Nitpick comments (1)
cpp/src/pdlp/cusparse_view.cu (1)
1160-1224: Encode the double-only contract in the helper itself.This method hardcodes
CUDA_R_64F, butcusparse_view_t<int, float>is instantiated in this file at line 1230. The call site inpdlp.cuhas a compile-time guard (if constexpr (std::is_same_v<f_t, double>)) that keeps this off the float path today. Adding anif constexprhere would make that contract explicit and local instead of relying on caller discipline, consistent with the type-dispatch pattern already used elsewhere in this file (e.g., lines 225–228, 262–265).♻️ Suggested guard
template <typename i_t, typename f_t> void cusparse_view_t<i_t, f_t>::create_spmv_op_plans(bool is_reflected) { `#if` CUDA_VER_13_2_UP + if constexpr (!std::is_same_v<f_t, double>) { return; } + // Prepare buffers for At_y SpMVOp size_t buffer_size_transpose = 0; RAFT_CUSPARSE_TRY(cusparseSpMVOp_bufferSize(handle_ptr_->get_cusparse_handle(),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/cusparse_view.cu` around lines 1160 - 1224, The create_spmv_op_plans method hardcodes CUDA_R_64F despite being templated on f_t; add an if constexpr(std::is_same_v<f_t, double>) guard around the CUDA_R_64F-specific cusparseSpMVOp_bufferSize/createDescr/createPlan calls in cusparse_view_t<i_t,f_t>::create_spmv_op_plans so the double-only contract is enforced inside the helper (for both the A_T/At_y and the is_reflected A_x branches); for non-double f_t, skip these double-only calls (or assert/throw/return) to match the type-dispatch pattern used elsewhere and avoid relying on callers to gate instantiations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1163-1221: The new CUDA-13.2+ cusparseSpMVOp_* calls
(cusparseSpMVOp_bufferSize, cusparseSpMVOp_createDescr,
cusparseSpMVOp_createPlan) are only compile-gated with `#if` CUDA_VER_13_2_UP but
still link directly at runtime, breaking environments that rely on the file's
existing dynamic loading path (dynamic_load_runtime and
cusparseSpMV_preprocess). Fix by either (A) wiring these three symbols into the
same dynamic loader used elsewhere (add runtime lookups and use those function
pointers instead of direct symbols) or (B) add a runtime-version check that
falls back to the existing cusparseSpMV_preprocess-based code path when the
runtime does not expose the SpMVOp API; update both the A_t path
(spmv_op_descr_A_t_, spmv_op_plan_A_t_) and the reflected A path
(spmv_op_descr_A_, spmv_op_plan_A_) and ensure is_reflected handling still uses
the fallback when needed.
---
Nitpick comments:
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1160-1224: The create_spmv_op_plans method hardcodes CUDA_R_64F
despite being templated on f_t; add an if constexpr(std::is_same_v<f_t, double>)
guard around the CUDA_R_64F-specific
cusparseSpMVOp_bufferSize/createDescr/createPlan calls in
cusparse_view_t<i_t,f_t>::create_spmv_op_plans so the double-only contract is
enforced inside the helper (for both the A_T/At_y and the is_reflected A_x
branches); for non-double f_t, skip these double-only calls (or
assert/throw/return) to match the type-dispatch pattern used elsewhere and avoid
relying on callers to gate instantiations.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fb417ff9-5d4f-44fc-b6a4-d5f552182965
📒 Files selected for processing (4)
cpp/src/pdlp/cusparse_view.cucpp/src/pdlp/cusparse_view.hppcpp/src/pdlp/pdhg.cucpp/src/pdlp/pdlp.cu
🚧 Files skipped from review as they are similar to previous changes (3)
- cpp/src/pdlp/pdlp.cu
- cpp/src/pdlp/pdhg.cu
- cpp/src/pdlp/cusparse_view.hpp
|
will add code rabbit suggestions |
|
/ok to test 293a33f |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
cpp/src/pdlp/cusparse_view.cu (1)
1039-1054:⚠️ Potential issue | 🟠 MajorKeep the new SpMVOp symbols behind the same runtime-compatibility mechanism as the existing preprocess hook.
This TU already uses
dlsymfor version-sensitive cuSPARSE entry points, but the newcusparseSpMVOp_*create/destroy calls are linked directly. If mixed build/runtime CUDA versions are still a supported deployment mode, this path now bypasses the existing fallback story. Either make the required cuSPARSE runtime explicit or dynamically resolve/fallback here too.According to the official cuSPARSE documentation, is mixing a newer `cusparse.h` with an older runtime `libcusparse.so` supported, and in which CUDA/cuSPARSE release were the `cusparseSpMVOp_*` APIs used here introduced (including `cusparseSpMVOp_bufferSize`)?Also applies to: 1176-1241
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/cusparse_view.cu` around lines 1039 - 1054, The destructor calls for the new cuSPARSE SpMVOp APIs (cusparseSpMVOp_destroyPlan, cusparseSpMVOp_destroyDescr and their create counterparts used elsewhere) are linked directly and bypass the existing dlsym-based runtime-compatibility/fallback mechanism; change these to use the same dynamic resolver (the TU's dlsym-based lookup used for preprocess hooks) or require/explicitly check for a compatible cuSPARSE runtime before calling, and mirror that change for the matching create/use sites (e.g., the cusparseSpMVOp_createPlan/cusparseSpMVOp_createDescr and cusparseSpMVOp_bufferSize usages around lines 1176-1241) so the code falls back cleanly when the runtime lacks these symbols.
🧹 Nitpick comments (1)
cpp/src/pdlp/cusparse_view.cu (1)
307-308: Align the version guard with the constructor initializer lists.These four initializers assume the SpMVOp buffers always exist, but the members themselves are only declared under
CUDA_VER_13_2_UPincpp/src/pdlp/cusparse_view.hpp. If 13.2 is not the hard minimum, this split guard will break older-toolkit builds; if it is, the conditional members just hide the real requirement. Please make that contract consistent in both places.#!/bin/bash set -euo pipefail echo "Conditional declarations vs constructor initializers:" rg -n -C2 'CUDA_VER_13_2_UP|buffer_(non_transpose|transpose)_spmvop' \ cpp/src/pdlp/cusparse_view.hpp cpp/src/pdlp/cusparse_view.cu echo echo "CUDA/cuSPARSE version expectations declared in the repo:" rg -n -C2 '13\.2|12\.4|CUSPARSE_ENABLE_EXPERIMENTAL_API|CUDART_VERSION|find_package\(CUDAToolkit' \ cpp/CMakeLists.txt cpp/srcAlso applies to: 721-722, 932-933, 1066-1067
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/cusparse_view.cu` around lines 307 - 308, The constructor initializer list references buffer_non_transpose_spmvop and buffer_transpose_spmvop but those members are only declared under the CUDA_VER_13_2_UP guard in the header, causing mismatch; update the guards to be consistent by moving the CUDA_VER_13_2_UP macro to cover the same places in both declaration and constructor (either declare the members unconditionally or wrap the constructor initializers and any uses of buffer_non_transpose_spmvop/ buffer_transpose_spmvop in the same CUDA_VER_13_2_UP guard), and apply the same fix to the other occurrences you noted (the similar initializer pairs at the other constructor sites).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 1039-1054: The destructor calls for the new cuSPARSE SpMVOp APIs
(cusparseSpMVOp_destroyPlan, cusparseSpMVOp_destroyDescr and their create
counterparts used elsewhere) are linked directly and bypass the existing
dlsym-based runtime-compatibility/fallback mechanism; change these to use the
same dynamic resolver (the TU's dlsym-based lookup used for preprocess hooks) or
require/explicitly check for a compatible cuSPARSE runtime before calling, and
mirror that change for the matching create/use sites (e.g., the
cusparseSpMVOp_createPlan/cusparseSpMVOp_createDescr and
cusparseSpMVOp_bufferSize usages around lines 1176-1241) so the code falls back
cleanly when the runtime lacks these symbols.
---
Nitpick comments:
In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 307-308: The constructor initializer list references
buffer_non_transpose_spmvop and buffer_transpose_spmvop but those members are
only declared under the CUDA_VER_13_2_UP guard in the header, causing mismatch;
update the guards to be consistent by moving the CUDA_VER_13_2_UP macro to cover
the same places in both declaration and constructor (either declare the members
unconditionally or wrap the constructor initializers and any uses of
buffer_non_transpose_spmvop/ buffer_transpose_spmvop in the same
CUDA_VER_13_2_UP guard), and apply the same fix to the other occurrences you
noted (the similar initializer pairs at the other constructor sites).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 27d51901-b3a6-4a92-bf2b-56b0d217fabe
📒 Files selected for processing (2)
cpp/src/pdlp/cusparse_view.cucpp/src/pdlp/cusparse_view.hpp
cpp/src/pdlp/cusparse_view.hpp
Outdated
| const rmm::device_uvector<i_t>&, // Empty just to init the const& | ||
| const std::vector<pdlp_climber_strategy_t>&); // Empty just to init the const& | ||
|
|
||
| ~cusparse_view_t(); |
There was a problem hiding this comment.
Instead of using a destructor for the whole class, create a wrapper around new cusparseSpMVOpDescr_t and plan with a destructor just like with cusparse_dn_vec_descr_wrapper_t to avoid forgetting destroying futur plan in the cusparse destructor
cpp/src/pdlp/cusparse_view.cu
Outdated
| RAFT_CUSPARSE_TRY_NO_THROW(cusparseSpMVOp_destroyDescr(spmv_op_descr_A_)); | ||
| } | ||
| #endif | ||
| } |
There was a problem hiding this comment.
As already stated, create a custom wrapper instead
cpp/src/pdlp/cusparse_view.cu
Outdated
| spmv_op_descr_A_, | ||
| &spmv_op_plan_A_, | ||
| lto_buffer, | ||
| lto_buffer_size)); |
There was a problem hiding this comment.
Make sure it's safe to reuse lto_buffer and lto_buffer_size in the two calls
|
@Bubullzz Can you summarize the performance results as well in the PR description? |
|
I think it might be better to wait for after the release to merge this PR, we don't have a cuda 13.2 CI which is needed for SpMVOp |
|
@Iroy30 can you help with CI on this PR? |
|
/ok test a086c81 |
Description
Replaced SpMV calls with SpMVOp calls in PDHG in order to accelerate compute. These changes are only available for double precision. Benchmarks on B200 full Mittleman dataset with concurrent strategy gave a 1.09x speedup.
Issue
closes #9