Skip to content

Race batch PDLP and dual simplex in strong branching / reliability branching#994

Open
Kh4ster wants to merge 44 commits intorelease/26.04from
race_strong_branching_realibility_branching
Open

Race batch PDLP and dual simplex in strong branching / reliability branching#994
Kh4ster wants to merge 44 commits intorelease/26.04from
race_strong_branching_realibility_branching

Conversation

@Kh4ster
Copy link
Copy Markdown
Contributor

@Kh4ster Kh4ster commented Mar 25, 2026

This is still a draft PR. Needs extra work. Just for early review.

This PR enables the following regarding batch PDLP:

  • Enable batch PDLP in reliability branching
  • Add work stealing so that batch PDLP and Dual Simplex can run concurrently and steal LPs from each other if one solves it first
  • Use correct problem representation with cuts for batch PDLP
  • Use a PDLP warm start cache across strong branching at the root and in reliability branching
  • Increase tolerance on batch PDLP to have higher quality solution
  • Increase iteration limit to allow instances that needs a high iteration count (with low cost per iteration) to still come through (only while solving the original LP to get warm start data)
  • Multiple heuristics to not run batch PDLP to not create overheads when Dual Simplex is clearly superior
  • Don't store and copy primal dual solution unless need it to save on memory
  • Handle batch PDLP errors better, allowing Dual Simplex to still continue in strong branching even if BPDLP fails

Kh4ster and others added 30 commits February 9, 2026 17:47
…te_lp_problem' into race_strong_branching_realibility_branching
…ng, correctly fill the ds_obj objective before merging results at the root, correctly clamp the PDLP objective, remove the unucessary cuopt_assert regarding fixed point error
@Kh4ster Kh4ster requested review from a team as code owners March 25, 2026 17:26
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 25, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Kh4ster Kh4ster requested review from chris-maes and nguidotti and removed request for Iroy30, hlinsen, jakirkham and nguidotti March 25, 2026 17:27
@Kh4ster Kh4ster self-assigned this Mar 25, 2026
@Kh4ster Kh4ster added non-breaking Introduces a non-breaking change improvement Improves an existing functionality pdlp mip labels Mar 25, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

Adds cooperative Dual Simplex ⇄ Batch PDLP strong/reliability branching, new tri-state reliability parameter and macro, shared strong-branching context types, PDLP warm-start/initial-iteration controls, coordinated concurrent termination/solution collection, and related API/signature updates across branching, PDLP, solver settings, tests, and build files.

Changes

Cohort / File(s) Summary
MIP Parameter Constants
cpp/include/cuopt/linear_programming/constants.h, cpp/src/math_optimization/solver_settings.cu
Added CUOPT_MIP_BATCH_PDLP_RELIABILITY_BRANCHING macro/parameter; widened CUOPT_MIP_BATCH_PDLP_STRONG_BRANCHING range to 0..2 and changed default to 1.
MIP Solver Settings (Public API)
cpp/include/cuopt/linear_programming/mip/solver_settings.hpp
Changed mip_batch_pdlp_strong_branching default to 1 and added mip_batch_pdlp_reliability_branching member.
Dual Simplex / Simplex Settings
cpp/src/dual_simplex/simplex_solver_settings.hpp
Updated mip_batch_pdlp_strong_branching semantics to tri-state; added mip_batch_pdlp_reliability_branching field.
PDLP Solver Settings (Public API & Impl)
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp, cpp/src/pdlp/solver_settings.cu
Added initial-PDLP-iteration accessor (set/get_initial_pdlp_iteration), shared_sb_view, generate_batch_primal_dual_solution, sub_batch_size; strengthened validations for initial step/weight setters.
Shared Strong Branching Context
cpp/src/branch_and_bound/shared_strong_branching_context.hpp
New shared_strong_branching_context_t and shared_strong_branching_context_view_t for atomic per-subproblem solved-state sharing and subviews.
Pseudo-Costs & Strong Branching (API + Impl)
cpp/src/branch_and_bound/pseudo_costs.hpp, cpp/src/branch_and_bound/pseudo_costs.cpp, cpp/src/branch_and_bound/branch_and_bound.cpp
Refactored strong-branching signatures to pass new_slacks and root dual components; trial_branching now returns (objective,status); added batch_pdlp_warm_cache_t; implemented cooperative batch PDLP launch/merge logic and updated call sites.
PDLP Core Solver & Batch Sizing
cpp/src/pdlp/pdlp.cu, cpp/src/pdlp/solve.cu
Integrated shared-context checks to mark/skip externally-solved climbers; switched concurrent halt checks to atomic loads; added initial-iteration/warm-start sourcing; memory estimator now double with larger safety factor; batch sizing returns NumericalError on unsatisfiable size; optional per-batch solution collection and sub-batch sizing support.
PDLP Termination & Restart
cpp/src/pdlp/termination_strategy/termination_strategy.hpp, cpp/src/pdlp/termination_strategy/termination_strategy.cu, cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cu
Added set_termination_status(id,status); treat ConcurrentLimit as terminal in is_done; added debug output in restart check.
MIP Integration & Build
cpp/src/mip_heuristics/solver.cu, cpp/src/mip_heuristics/CMakeLists.txt, cpp/src/pdlp/translate.hpp
Propagated mip_batch_pdlp_reliability_branching into branch-and-bound settings; moved clique_table.cu into LP-required list; added MIP include to PDLP translate header.
Tests
cpp/tests/linear_programming/pdlp_test.cu
Enabled generation of primal/dual outputs and added cooperative strong-branching tests covering shared-context view, sub-batching, threaded interaction, and termination-status expectations.
Python Config
python/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py
Changed mip_batch_pdlp_strong_branching default to 1 and description to tri-state; added mip_batch_pdlp_reliability_branching field (default 1) with tri-state description.
Miscellaneous Includes / Low-risk changes
cpp/src/pdlp/pdlp_constants.hpp, cpp/src/pdlp/utilities/ping_pong_graph.cu
Removed unused include from pdlp_constants; added RAFT CUDA utils include for error macros in ping_pong_graph.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: enabling batch PDLP to race concurrently with dual simplex in strong and reliability branching workflows.
Description check ✅ Passed The description is related to the changeset, providing context about batch PDLP integration, work-stealing mechanisms, warm-start caching, and various optimizations across the codebase modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch race_strong_branching_realibility_branching

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/branch_and_bound/pseudo_costs.cpp (1)

104-109: ⚠️ Potential issue | 🔴 Critical

Don't derive early-stopped simplex bounds from solution.x.

For ITERATION_LIMIT, dual simplex is still useful here because it preserves a valid lower bound, but compute_objective(child_problem, solution.x) is not that bound. Feeding c^T x from a non-optimal iterate into strong-branch scores and pseudo-cost updates can bias branching decisions.

Based on learnings, cuOpt dual simplex is dual-safe, so the safe lower bound at ITERATION_LIMIT is the solver’s maintained objective value, not an objective recomputed from the current primal iterate.

Also applies to: 257-259

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 104 - 109, The code
sets obj = compute_objective(child_problem, solution.x) when status ==
dual::status_t::ITERATION_LIMIT, which incorrectly uses a non‑optimal primal
iterate; instead fetch and use the solver's maintained objective/lower bound for
the iterate (the dual‑safe objective value the solver preserves) rather than
recomputing c^T x from solution.x. Locate the status checks
(dual::status_t::ITERATION_LIMIT) and replace the
compute_objective(child_problem, solution.x) use with the solver‑reported
objective/lower bound (e.g., the solution's maintained objective field or a
solver API that returns the current bound) in both the block around
compute_objective and the other occurrence that mirrors lines 257-259 so
pseudo‑cost updates and strong‑branch scoring use the dual‑safe bound.
🧹 Nitpick comments (4)
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp (1)

21-22: Consider moving the shared strong-branching view to a solver-neutral header.

pdlp_solver_settings_t now pulls a branch_and_bound/ header into the PDLP settings layer just to carry a coordination view. Relocating that type to a shared LP module would keep the dependency direction cleaner and make PDLP easier to reuse outside branch-and-bound.

Based on learnings "Reduce tight coupling between solver components (presolve, simplex, basis, barrier); increase modularity and reusability of optimization algorithms"

Also applies to: 277-278

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp` around lines
21 - 22, The PDLP settings header currently includes
branch_and_bound/shared_strong_branching_context.hpp to obtain the coordination
view; instead move the shared strong-branching view type out of branch_and_bound
into a solver-neutral header in the shared LP module (e.g.,
shared_lp/strong_branching_view.hpp) and update pdlp_solver_settings_t to
include that new header rather than the branch_and_bound one; also update any
other files that currently include shared_strong_branching_context.hpp (refs:
pdlp_solver_settings_t and the other occurrences noted) to include the new
solver-neutral header so PDLP no longer depends on branch_and_bound.
cpp/src/branch_and_bound/shared_strong_branching_context.hpp (1)

37-42: Consider relaxed memory ordering for potential performance improvement.

The current default memory_order_seq_cst is correct but may have unnecessary overhead. Since the solved flag is only used as a hint (no other data depends on it), relaxed ordering would suffice:

♻️ Optional: Use relaxed memory ordering
   bool is_solved(i_t local_idx) const
   {
-    return solved[local_idx].load() != 0;
+    return solved[local_idx].load(std::memory_order_relaxed) != 0;
   }

-  void mark_solved(i_t local_idx) const { solved[local_idx].store(1); }
+  void mark_solved(i_t local_idx) const { solved[local_idx].store(1, std::memory_order_relaxed); }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/shared_strong_branching_context.hpp` around lines 37
- 42, The solved atomic currently uses default seq_cst ordering in is_solved and
mark_solved; change them to use relaxed ordering since solved[] is only a hint
and no other data depends on it: update is_solved to call
solved[local_idx].load(std::memory_order_relaxed) and update mark_solved to call
solved[local_idx].store(1, std::memory_order_relaxed) (references: is_solved,
mark_solved, solved).
cpp/tests/linear_programming/pdlp_test.cu (2)

2310-2342: Same memory leak risk with raw pointer pattern.

This test has the same raw new/delete pattern as shared_sb_view_concurrent_mark. Apply the same std::unique_ptr fix for exception safety.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 2310 - 2342, Replace
the raw new/delete pattern with a std::unique_ptr to ensure exception safety:
change result_ptr from a raw pointer to
std::unique_ptr<optimization_problem_solution_t<int,double>> result_ptr; inside
the pdlp_thread lambda assign result_ptr =
std::make_unique<optimization_problem_solution_t<int,double>>(solve_lp(&handle_,
op_problem, solver_settings)); remove the explicit delete result_ptr; keep the
existing dereference (auto& solution = *result_ptr) and joins unchanged; this
preserves semantics while preventing leaks.

2244-2278: Consider using std::unique_ptr to prevent memory leaks on exceptions.

The raw new/delete pattern risks leaking memory if an exception is thrown between allocation and cleanup. Additionally, the 200ms sleep-based synchronization may cause test flakiness under varying system loads.

♻️ Suggested improvement using unique_ptr
-  optimization_problem_solution_t<int, double>* result_ptr = nullptr;
+  std::unique_ptr<optimization_problem_solution_t<int, double>> result_ptr;

   auto pdlp_thread = std::thread([&]() {
-    auto sol = new optimization_problem_solution_t<int, double>(
-      solve_lp(&handle_, op_problem, solver_settings));
-    result_ptr = sol;
+    result_ptr = std::make_unique<optimization_problem_solution_t<int, double>>(
+      solve_lp(&handle_, op_problem, solver_settings));
   });

   // ...

   pdlp_thread.join();

   ASSERT_NE(result_ptr, nullptr);
-  auto& solution = *result_ptr;
+  auto& solution = *result_ptr;

   // ... assertions ...

-  delete result_ptr;
+  // unique_ptr handles cleanup automatically
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 2244 - 2278, Replace
the raw new/delete and sleep-based sync: allocate the solution into a
std::unique_ptr<optimization_problem_solution_t<int,double>> and transfer it out
of the thread via a std::promise<std::unique_ptr<...>>/std::future to avoid
leaks if exceptions occur (replace result_ptr and the new in the lambda with
promise.set_value(std::move(ptr))). Replace the fixed
std::this_thread::sleep_for with a deterministic synchronization that waits for
the driver simulation to mark entries solved (e.g., use a condition_variable or
wait on the atomic ctx.solved flags with a short timeout loop) so the test does
not rely on timing; update references to result_ptr to use the future’s returned
unique_ptr and remove manual delete.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 517-520: The debug block refers to a non-existent identifier
original_solution; replace it with the correct variable that holds the solve
result in this scope (the one that exposes
get_pdlp_warm_start_data().total_pdlp_iterations_), e.g., use the actual
solution variable name used elsewhere in this function (search for the variable
that calls get_pdlp_warm_start_data()) so the BATCH_VERBOSE_MODE print uses a
valid identifier and compiles.
- Around line 949-957: The reliability-branching path allows batch PDLP when it
should be disabled for nested sub-MIP runs; update the use_pdlp guard to also
check settings.sub_mip is false before starting/using batch PDLP and the
pdlp_warm_cache (i.e., include a check against settings.sub_mip in the boolean
expression that defines use_pdlp so batch PDLP/warm-cache reuse is skipped when
settings.sub_mip is true in the reliability-branching code that computes
use_pdlp/pdlp_warm_cache).

In `@cpp/src/branch_and_bound/pseudo_costs.hpp`:
- Around line 411-420: The PDLP warm-start cache
(batch_pdlp_warm_cache_t::populated and its device_uvector/scalar fields) is
never invalidated, causing stale warm-start data to persist across different
solves; add a reset() method on batch_pdlp_warm_cache_t that clears or
reinitializes initial_primal and initial_dual (release or resize to 0 on the
RAFT/RMM stream), sets populated=false, and resets
step_size/primal_weight/pdlp_iteration/pourcent_solved_by_batch_pdlp_at_root to
NaN/initial values, then ensure that pdlp_warm_cache.reset() is invoked from the
solver’s solve-start path (e.g., inside pseudo_costs_t::reset or the method that
invalidates LP/cut context) so the cache is cleared whenever the LP/cut
configuration changes.

In `@cpp/src/math_optimization/solver_settings.cu`:
- Around line 102-103: The registry metadata entries for
CUOPT_MIP_BATCH_PDLP_STRONG_BRANCHING and
CUOPT_MIP_BATCH_PDLP_RELIABILITY_BRANCHING advertise a default of 0 while
mip_solver_settings_t now default-constructs those knobs to 1; update the
metadata in solver_settings.cu so the default values for the entries referencing
mip_settings.mip_batch_pdlp_strong_branching and
mip_settings.mip_batch_pdlp_reliability_branching are changed from 0 to 1 to
match the constructor defaults and ensure reset/metadata reflect the true
initial state.

In `@cpp/src/pdlp/solver_settings.cu`:
- Around line 351-355: The setter
pdlp_solver_settings_t<i_t,f_t>::set_initial_pdlp_iteration currently accepts
any value; add a validation that rejects negative iterations by checking if
initial_pdlp_iteration < 0 and throwing a std::invalid_argument (with a clear
message) when negative, otherwise assign initial_pdlp_iteration_ =
std::make_optional(initial_pdlp_iteration); ensure the check uses a signed
comparison or constrained type handling so it behaves correctly for signed i_t.

In `@cpp/src/pdlp/termination_strategy/termination_strategy.cu`:
- Around line 127-132: The setter pdlp_termination_strategy_t<i_t,
f_t>::set_termination_status writes termination_status_[id] without validating
id; add an index bounds guard (e.g., compare id against
termination_status_.size() or a stored max_id) before the write and handle
out-of-range cases (assert/throw/log-and-return) to avoid corrupting state when
id is invalid, using the existing parameter names (id, status) and the member
termination_status_.

In
`@python/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py`:
- Around line 454-463: The defaults for mip_batch_pdlp_strong_branching and
mip_batch_pdlp_reliability_branching currently use default=0 which disables
cooperative/racing branching; change their defaults to enable cooperative
behavior by default (set default=1) so server requests that omit these fields
use the intended cooperative/batch PDLP mode, and adjust any related
docstring/description text if needed to reflect the new default.

---

Outside diff comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 104-109: The code sets obj = compute_objective(child_problem,
solution.x) when status == dual::status_t::ITERATION_LIMIT, which incorrectly
uses a non‑optimal primal iterate; instead fetch and use the solver's maintained
objective/lower bound for the iterate (the dual‑safe objective value the solver
preserves) rather than recomputing c^T x from solution.x. Locate the status
checks (dual::status_t::ITERATION_LIMIT) and replace the
compute_objective(child_problem, solution.x) use with the solver‑reported
objective/lower bound (e.g., the solution's maintained objective field or a
solver API that returns the current bound) in both the block around
compute_objective and the other occurrence that mirrors lines 257-259 so
pseudo‑cost updates and strong‑branch scoring use the dual‑safe bound.

---

Nitpick comments:
In `@cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp`:
- Around line 21-22: The PDLP settings header currently includes
branch_and_bound/shared_strong_branching_context.hpp to obtain the coordination
view; instead move the shared strong-branching view type out of branch_and_bound
into a solver-neutral header in the shared LP module (e.g.,
shared_lp/strong_branching_view.hpp) and update pdlp_solver_settings_t to
include that new header rather than the branch_and_bound one; also update any
other files that currently include shared_strong_branching_context.hpp (refs:
pdlp_solver_settings_t and the other occurrences noted) to include the new
solver-neutral header so PDLP no longer depends on branch_and_bound.

In `@cpp/src/branch_and_bound/shared_strong_branching_context.hpp`:
- Around line 37-42: The solved atomic currently uses default seq_cst ordering
in is_solved and mark_solved; change them to use relaxed ordering since solved[]
is only a hint and no other data depends on it: update is_solved to call
solved[local_idx].load(std::memory_order_relaxed) and update mark_solved to call
solved[local_idx].store(1, std::memory_order_relaxed) (references: is_solved,
mark_solved, solved).

In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 2310-2342: Replace the raw new/delete pattern with a
std::unique_ptr to ensure exception safety: change result_ptr from a raw pointer
to std::unique_ptr<optimization_problem_solution_t<int,double>> result_ptr;
inside the pdlp_thread lambda assign result_ptr =
std::make_unique<optimization_problem_solution_t<int,double>>(solve_lp(&handle_,
op_problem, solver_settings)); remove the explicit delete result_ptr; keep the
existing dereference (auto& solution = *result_ptr) and joins unchanged; this
preserves semantics while preventing leaks.
- Around line 2244-2278: Replace the raw new/delete and sleep-based sync:
allocate the solution into a
std::unique_ptr<optimization_problem_solution_t<int,double>> and transfer it out
of the thread via a std::promise<std::unique_ptr<...>>/std::future to avoid
leaks if exceptions occur (replace result_ptr and the new in the lambda with
promise.set_value(std::move(ptr))). Replace the fixed
std::this_thread::sleep_for with a deterministic synchronization that waits for
the driver simulation to mark entries solved (e.g., use a condition_variable or
wait on the atomic ctx.solved flags with a short timeout loop) so the test does
not rely on timing; update references to result_ptr to use the future’s returned
unique_ptr and remove manual delete.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8007b7e7-5e76-42d9-b05d-cd681c53cf23

📥 Commits

Reviewing files that changed from the base of the PR and between f80d6cf and b0061e4.

📒 Files selected for processing (20)
  • cpp/include/cuopt/linear_programming/constants.h
  • cpp/include/cuopt/linear_programming/mip/solver_settings.hpp
  • cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
  • cpp/src/branch_and_bound/branch_and_bound.cpp
  • cpp/src/branch_and_bound/pseudo_costs.cpp
  • cpp/src/branch_and_bound/pseudo_costs.hpp
  • cpp/src/branch_and_bound/shared_strong_branching_context.hpp
  • cpp/src/dual_simplex/simplex_solver_settings.hpp
  • cpp/src/math_optimization/solver_settings.cu
  • cpp/src/mip_heuristics/CMakeLists.txt
  • cpp/src/mip_heuristics/solver.cu
  • cpp/src/pdlp/pdlp.cu
  • cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/pdlp/solve.cu
  • cpp/src/pdlp/solver_settings.cu
  • cpp/src/pdlp/termination_strategy/termination_strategy.cu
  • cpp/src/pdlp/termination_strategy/termination_strategy.hpp
  • cpp/src/pdlp/translate.hpp
  • cpp/tests/linear_programming/pdlp_test.cu
  • python/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py

Comment on lines +949 to +957
const i_t rb_mode = settings.mip_batch_pdlp_reliability_branching;
// We don't use batch PDLP in reliability branching if the PDLP warm start data was not filled
// This indicates that PDLP alone (not batched) couldn't even run at the root node
// So it will most likely perform poorly compared to DS
// It is also off if the number of candidate is very small
// If warm start could run but almost none of the BPDLP results were used, we also want to avoid using batch PDLP
constexpr i_t min_num_candidates_for_pdlp = 5;
constexpr f_t min_pourcent_solved_by_batch_pdlp_at_root_for_pdlp = 5.0;
const bool use_pdlp = (rb_mode != 0) && pdlp_warm_cache.populated && unreliable_list.size() > min_num_candidates_for_pdlp && pdlp_warm_cache.pourcent_solved_by_batch_pdlp_at_root > min_pourcent_solved_by_batch_pdlp_at_root_for_pdlp;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Mirror the sub-MIP disable guard in reliability branching.

Root strong branching already forces batch PDLP off when settings.sub_mip is true, but this new RB path doesn't. As written, nested sub-MIP reliability branching can still start the batch PDLP thread and reuse the warm cache on a path the PR summary says should be disabled.

Suggested fix
-  const bool use_pdlp = (rb_mode != 0) && pdlp_warm_cache.populated && unreliable_list.size() > min_num_candidates_for_pdlp && pdlp_warm_cache.pourcent_solved_by_batch_pdlp_at_root > min_pourcent_solved_by_batch_pdlp_at_root_for_pdlp;
+  const bool use_pdlp =
+    !settings.sub_mip &&
+    (rb_mode != 0) &&
+    pdlp_warm_cache.populated &&
+    unreliable_list.size() > min_num_candidates_for_pdlp &&
+    pdlp_warm_cache.pourcent_solved_by_batch_pdlp_at_root >
+      min_pourcent_solved_by_batch_pdlp_at_root_for_pdlp;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 949 - 957, The
reliability-branching path allows batch PDLP when it should be disabled for
nested sub-MIP runs; update the use_pdlp guard to also check settings.sub_mip is
false before starting/using batch PDLP and the pdlp_warm_cache (i.e., include a
check against settings.sub_mip in the boolean expression that defines use_pdlp
so batch PDLP/warm-cache reuse is skipped when settings.sub_mip is true in the
reliability-branching code that computes use_pdlp/pdlp_warm_cache).

Comment on lines +411 to +420
template <typename i_t, typename f_t>
struct batch_pdlp_warm_cache_t {
const raft::handle_t batch_pdlp_handle{};
rmm::device_uvector<f_t> initial_primal{0, batch_pdlp_handle.get_stream()};
rmm::device_uvector<f_t> initial_dual{0, batch_pdlp_handle.get_stream()};
f_t step_size{std::numeric_limits<f_t>::signaling_NaN()};
f_t primal_weight{std::numeric_limits<f_t>::signaling_NaN()};
i_t pdlp_iteration{-1};
f_t pourcent_solved_by_batch_pdlp_at_root{f_t(0.0)};
bool populated{false};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Invalidate the PDLP warm-start cache when the LP context changes.

pdlp_warm_cache becomes sticky once populated is set, but this new persistent state has no reset/invalidation path. Reusing the same pseudo_costs_t for a later solve or a different cut/slack configuration can replay stale device vectors and warm-start scalars into PDLP.

Suggested direction
 template <typename i_t, typename f_t>
 struct batch_pdlp_warm_cache_t {
   const raft::handle_t batch_pdlp_handle{};
   rmm::device_uvector<f_t> initial_primal{0, batch_pdlp_handle.get_stream()};
   rmm::device_uvector<f_t> initial_dual{0, batch_pdlp_handle.get_stream()};
   f_t step_size{std::numeric_limits<f_t>::signaling_NaN()};
   f_t primal_weight{std::numeric_limits<f_t>::signaling_NaN()};
   i_t pdlp_iteration{-1};
   f_t pourcent_solved_by_batch_pdlp_at_root{f_t(0.0)};
   bool populated{false};
+
+  void reset()
+  {
+    initial_primal.resize(0, batch_pdlp_handle.get_stream());
+    initial_dual.resize(0, batch_pdlp_handle.get_stream());
+    step_size = std::numeric_limits<f_t>::signaling_NaN();
+    primal_weight = std::numeric_limits<f_t>::signaling_NaN();
+    pdlp_iteration = -1;
+    pourcent_solved_by_batch_pdlp_at_root = f_t(0.0);
+    populated = false;
+  }
 };

Call reset() from the solve-start path that invalidates the LP/cut context.

As per coding guidelines, algorithm state must be reset when transitioning between solver phases.

Also applies to: 535-535

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.hpp` around lines 411 - 420, The PDLP
warm-start cache (batch_pdlp_warm_cache_t::populated and its
device_uvector/scalar fields) is never invalidated, causing stale warm-start
data to persist across different solves; add a reset() method on
batch_pdlp_warm_cache_t that clears or reinitializes initial_primal and
initial_dual (release or resize to 0 on the RAFT/RMM stream), sets
populated=false, and resets
step_size/primal_weight/pdlp_iteration/pourcent_solved_by_batch_pdlp_at_root to
NaN/initial values, then ensure that pdlp_warm_cache.reset() is invoked from the
solver’s solve-start path (e.g., inside pseudo_costs_t::reset or the method that
invalidates LP/cut context) so the cache is cleared whenever the LP/cut
configuration changes.

@chris-maes chris-maes changed the title Race strong branching realibility branching Race batch PDLP and dual simplex in strong branching realibility branching Mar 25, 2026
@chris-maes chris-maes changed the title Race batch PDLP and dual simplex in strong branching realibility branching Race batch PDLP and dual simplex in strong branching / reliability branching Mar 25, 2026
@anandhkb anandhkb added this to the 26.06 milestone Mar 26, 2026
@anandhkb
Copy link
Copy Markdown
Contributor

I think this targets the next 26.06 release. @chris-maes @Kh4ster for advise.

@Kh4ster
Copy link
Copy Markdown
Contributor Author

Kh4ster commented Mar 30, 2026

/ok to test f456c71

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cpp/src/pdlp/solve.cu (1)

843-896: ⚠️ Potential issue | 🟠 Major

Honor generate_batch_primal_dual_solution in the memory estimate.

The estimator still budgets three full batch solution buffers even when solution collection is disabled. That erases the intended memory win and can force needless sub-batching or a spurious NumericalError on workloads that would actually fit.

Suggested fix
-static double batch_pdlp_memory_estimator(const optimization_problem_t<i_t, f_t>& problem,
-                                          double trial_batch_size)
+static double batch_pdlp_memory_estimator(const optimization_problem_t<i_t, f_t>& problem,
+                                          double trial_batch_size,
+                                          bool include_solution_buffers)
 {
   double total_memory = 0.0;
@@
-  total_memory += problem.get_n_variables() * trial_batch_size * sizeof(f_t);
-  total_memory += problem.get_n_constraints() * trial_batch_size * sizeof(f_t);
-  total_memory += problem.get_n_variables() * trial_batch_size * sizeof(f_t);
+  if (include_solution_buffers) {
+    total_memory += problem.get_n_variables() * trial_batch_size * sizeof(f_t);
+    total_memory += problem.get_n_constraints() * trial_batch_size * sizeof(f_t);
+    total_memory += problem.get_n_variables() * trial_batch_size * sizeof(f_t);
+  }
@@
-  const double memory_estimate = batch_pdlp_memory_estimator(problem, max_batch_size);
+  const double memory_estimate = batch_pdlp_memory_estimator(
+    problem, max_batch_size, settings.generate_batch_primal_dual_solution);
@@
-      const double memory_estimate = batch_pdlp_memory_estimator(problem, memory_max_batch_size);
+      const double memory_estimate = batch_pdlp_memory_estimator(
+        problem, memory_max_batch_size, settings.generate_batch_primal_dual_solution);
@@
-    const double min_estimate = batch_pdlp_memory_estimator(problem, memory_max_batch_size);
+    const double min_estimate = batch_pdlp_memory_estimator(
+      problem, memory_max_batch_size, settings.generate_batch_primal_dual_solution);
As per coding guidelines, "Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/solve.cu` around lines 843 - 896, The memory estimator
batch_pdlp_memory_estimator currently always allocates three full batch solution
buffers; modify it to check the generate_batch_primal_dual_solution flag (or
equivalent configuration) and only include the per-batch solution buffers (the
three additions near "Data for the solution" that use problem.get_n_variables()
* trial_batch_size and problem.get_n_constraints() * trial_batch_size) when
solution collection is enabled. Update the logic in batch_pdlp_memory_estimator
to conditionally add those solution-related memory contributions based on
generate_batch_primal_dual_solution (or the solver option used to control
solution generation) so the estimator reflects the true memory needs when
solution collection is disabled.
♻️ Duplicate comments (1)
cpp/src/branch_and_bound/pseudo_costs.cpp (1)

994-997: ⚠️ Potential issue | 🟠 Major

Mirror the sub-MIP disable guard in reliability branching.

Root strong branching disables batch PDLP when settings.sub_mip is true (line 445), but this reliability branching path doesn't include the same guard. Nested sub-MIP reliability branching can still start the batch PDLP thread and reuse the warm cache, which the PR description indicates should be disabled.

Suggested fix
-  const bool use_pdlp = (rb_mode != 0) && pdlp_warm_cache.populated &&
+  const bool use_pdlp = !settings.sub_mip &&
+                        (rb_mode != 0) && pdlp_warm_cache.populated &&
                         unreliable_list.size() > min_num_candidates_for_pdlp &&
                         pdlp_warm_cache.pourcent_solved_by_batch_pdlp_at_root >
                           min_pourcent_solved_by_batch_pdlp_at_root_for_pdlp;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 994 - 997, The
reliability-branching condition that computes use_pdlp should mirror the root
strong-branching guard by disabling batch PDLP when settings.sub_mip is true;
update the boolean expression that defines use_pdlp (currently using rb_mode,
pdlp_warm_cache, unreliable_list, min_num_candidates_for_pdlp,
pdlp_warm_cache.pourcent_solved_by_batch_pdlp_at_root, and
min_pourcent_solved_by_batch_pdlp_at_root_for_pdlp) to include a
!settings.sub_mip check so nested sub-MIP reliability branching cannot start or
reuse the batch PDLP warm cache.
🧹 Nitpick comments (4)
cpp/src/branch_and_bound/pseudo_costs.cpp (4)

478-481: Track the commented-out TODO for dual solution uncrushing.

The uncrush_dual_solution call is commented out with a TODO referencing "Chris". This may affect warm-start quality if dual solution components (root_soln_y, root_soln_z) are needed for PDLP initialization.

Consider adding a tracking issue or ensuring this is addressed before the PR exits draft status.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 478 - 481, The call
to uncrush_dual_solution(...) is currently commented out, risking degraded
warm-starts because root_soln_y/root_soln_z (and
original_root_soln_y/original_root_soln_z) won't be restored; either re-enable
the call in the PDLP initialization path (using original_problem, original_lp,
root_soln_y, root_soln_z, original_root_soln_y, original_root_soln_z) or add a
tracked TODO/issue referencing this exact block so it must be resolved before
leaving draft status; ensure any enabling is guarded by a feature flag or a
clear comment linking the issue/owner (Chris) so reviewers can verify the
follow-up.

1167-1168: Consider using candidate_vars[i] consistently.

The taskloop uses unreliable_list[i] while the post-PDLP merge loop uses candidate_vars[i]. Since candidate_vars is defined as the first num_candidates elements of unreliable_list, they're functionally equivalent, but using candidate_vars[i] consistently would improve readability and reduce the risk of bugs if the logic changes.

-    for (i_t i = 0; i < num_candidates; ++i) {
-      const i_t j = unreliable_list[i];
+    for (i_t i = 0; i < num_candidates; ++i) {
+      const i_t j = candidate_vars[i];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1167 - 1168, The loop
currently iterates using unreliable_list[i] while later code uses
candidate_vars[i]; since candidate_vars is defined as the first num_candidates
of unreliable_list, change the loop to index candidate_vars[i] for consistency
and readability: replace uses of unreliable_list[i] inside the for-loop (and any
related variables derived there) with candidate_vars[i] so the same symbol is
used everywhere (e.g., in the taskloop and the post-PDLP merge loop) to avoid
future mismatches between unreliable_list and candidate_vars.

366-374: Consider epsilon comparison for sigma value checks.

The code uses exact floating-point equality (sigma == -1, sigma == 1) to determine constraint bound computation. While sigma values should be exactly ±1 from matrix construction, exact float comparisons can fail silently if numerical perturbations occur.

More critically, if sigma is neither exactly 1 nor exactly -1, the else branch triggers an assert in debug mode but leaves constraint_lower[i] and constraint_upper[i] uninitialized in release builds.

Suggested defensive fix
-    if (sigma == -1) {
+    if (sigma < 0) {  // sigma == -1
       constraint_lower[i] = slack_lower + lp.rhs[i];
       constraint_upper[i] = slack_upper + lp.rhs[i];
-    } else if (sigma == 1) {
+    } else {  // sigma == +1
       constraint_lower[i] = -slack_upper + lp.rhs[i];
       constraint_upper[i] = -slack_lower + lp.rhs[i];
-    } else {
-      assert(sigma == 1.0 || sigma == -1.0);
     }
+    assert(std::abs(std::abs(sigma) - 1.0) < 1e-9 && "sigma must be ±1");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 366 - 374, The code
currently compares sigma to exact values (-1 and 1) and leaves
constraint_lower[i] and constraint_upper[i] uninitialized if sigma is slightly
off, so change the branch to use an epsilon check (e.g., fabs(sigma - 1.0) < eps
and fabs(sigma + 1.0) < eps) to detect ±1; compute bounds using the existing
formulas when matched; otherwise set constraint_lower[i] and constraint_upper[i]
to a safe fallback (for example compute using sign = (sigma >= 0 ? 1 : -1) or
clamp sigma to its sign) and still trigger the assert/log for unexpected values
so release builds get initialized while debug builds still catch the anomaly;
reference symbols to update: sigma, constraint_lower, constraint_upper,
slack_lower, slack_upper, lp.rhs in pseudo_costs.cpp.

1272-1287: Clean up or track commented-out debug logging.

This block of commented-out logging code with "TODO put back" should either be removed, converted to a proper TODO with issue tracking, or re-enabled. Leaving large commented blocks reduces code readability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 1272 - 1287, This
large commented-out logging block around rb_mode and log.printf (referencing
num_candidates, ds_optimal, ds_infeasible, ds_failed, ds_skipped, ds_elapsed)
should not remain in place as-is: either remove it entirely, or restore it by
uncommenting and fixing formatting so the if (rb_mode != 2) { ... } branch
compiles and uses the existing variables, or replace the block with a single
concise TODO comment that references a tracker/issue ID; pick one approach and
apply it consistently so the file no longer contains the large commented-out
printf block.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp`:
- Around line 287-292: The default for generate_batch_primal_dual_solution is
currently false which breaks existing batch LP callers that expect
primal/dual/reduced-cost buffers; change the default for the
generate_batch_primal_dual_solution member in solver_settings.hpp back to true
so batch solves continue to populate those buffers by default, and update the
inline comment to reflect that the default preserves previous behavior (callers
that want to opt-out can still set generate_batch_primal_dual_solution=false or
use the existing strong/reliability-branching call sites to disable collection).

In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 158-172: Fix the typo in the comment above the DS check: change
"iteartion" to "iteration" in the comment that precedes the conditional where
sb_view.is_valid() and the ds_is_valid_done checks (the block that calls
sb_view.mark_solved and logs via settings.log.printf), so the comment reads "We
could not mark as solved nodes hitting iteration limit in DS".

In `@cpp/src/branch_and_bound/shared_strong_branching_context.hpp`:
- Around line 37-43: Add runtime bounds validation before any access to the
shared coordination state: ensure that settings.shared_sb_view.solved.size() (or
the span size exposed by shared_strong_branching_context_view_t) is >=
max_batch_size (where max_batch_size = settings.new_bounds.size()) before
entering run_batch_pdlp()'s batch loop; also add guard checks in the methods
is_solved(i_t), mark_solved(i_t) and subview(offset,count) to validate their
index/offset+count against the underlying solved span size and handle/report
invalid accesses (e.g., return false/log/throw) to prevent out-of-bounds access
and corrupted coordination state.

In `@cpp/src/pdlp/solve.cu`:
- Around line 961-965: The code allows settings.sub_batch_size to
unconditionally overwrite optimal_batch_size and can exceed the memory-checked
ceiling (memory_max_batch_size), risking GPU overcommit; update the logic around
optimal_batch_size (used with detail::optimal_batch_size_handler and compared to
max_batch_size) to revalidate or clamp settings.sub_batch_size against the
computed ceiling (memory_max_batch_size or max_batch_size) before assigning, and
adjust the subsequent cuopt_assert to reflect the validated/clamped value so an
oversized manual override cannot bypass the memory check.

In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 2244-2255: The test uses a fixed sleep
(std::this_thread::sleep_for(... 200ms)) to wait for PDLP to start, which is
flaky; replace this timing-based wait with deterministic synchronization: add a
latch/condition (e.g., std::promise/std::future or std::condition_variable +
mutex) or a std::barrier that the PDLP thread signals when it has reached the
point where external marks should be applied, then have the main test wait on
that synchronization primitive before setting ctx.solved[i].store(1); update the
lambda that creates pdlp_thread (pdlp_thread, solve_lp, and the PDLP entry
point) to signal the latch at the proper startup point so
shared_sb_view_concurrent_mark and shared_sb_view_all_infeasible
deterministically exercise the ConcurrentLimit race path.

---

Outside diff comments:
In `@cpp/src/pdlp/solve.cu`:
- Around line 843-896: The memory estimator batch_pdlp_memory_estimator
currently always allocates three full batch solution buffers; modify it to check
the generate_batch_primal_dual_solution flag (or equivalent configuration) and
only include the per-batch solution buffers (the three additions near "Data for
the solution" that use problem.get_n_variables() * trial_batch_size and
problem.get_n_constraints() * trial_batch_size) when solution collection is
enabled. Update the logic in batch_pdlp_memory_estimator to conditionally add
those solution-related memory contributions based on
generate_batch_primal_dual_solution (or the solver option used to control
solution generation) so the estimator reflects the true memory needs when
solution collection is disabled.

---

Duplicate comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 994-997: The reliability-branching condition that computes
use_pdlp should mirror the root strong-branching guard by disabling batch PDLP
when settings.sub_mip is true; update the boolean expression that defines
use_pdlp (currently using rb_mode, pdlp_warm_cache, unreliable_list,
min_num_candidates_for_pdlp,
pdlp_warm_cache.pourcent_solved_by_batch_pdlp_at_root, and
min_pourcent_solved_by_batch_pdlp_at_root_for_pdlp) to include a
!settings.sub_mip check so nested sub-MIP reliability branching cannot start or
reuse the batch PDLP warm cache.

---

Nitpick comments:
In `@cpp/src/branch_and_bound/pseudo_costs.cpp`:
- Around line 478-481: The call to uncrush_dual_solution(...) is currently
commented out, risking degraded warm-starts because root_soln_y/root_soln_z (and
original_root_soln_y/original_root_soln_z) won't be restored; either re-enable
the call in the PDLP initialization path (using original_problem, original_lp,
root_soln_y, root_soln_z, original_root_soln_y, original_root_soln_z) or add a
tracked TODO/issue referencing this exact block so it must be resolved before
leaving draft status; ensure any enabling is guarded by a feature flag or a
clear comment linking the issue/owner (Chris) so reviewers can verify the
follow-up.
- Around line 1167-1168: The loop currently iterates using unreliable_list[i]
while later code uses candidate_vars[i]; since candidate_vars is defined as the
first num_candidates of unreliable_list, change the loop to index
candidate_vars[i] for consistency and readability: replace uses of
unreliable_list[i] inside the for-loop (and any related variables derived there)
with candidate_vars[i] so the same symbol is used everywhere (e.g., in the
taskloop and the post-PDLP merge loop) to avoid future mismatches between
unreliable_list and candidate_vars.
- Around line 366-374: The code currently compares sigma to exact values (-1 and
1) and leaves constraint_lower[i] and constraint_upper[i] uninitialized if sigma
is slightly off, so change the branch to use an epsilon check (e.g., fabs(sigma
- 1.0) < eps and fabs(sigma + 1.0) < eps) to detect ±1; compute bounds using the
existing formulas when matched; otherwise set constraint_lower[i] and
constraint_upper[i] to a safe fallback (for example compute using sign = (sigma
>= 0 ? 1 : -1) or clamp sigma to its sign) and still trigger the assert/log for
unexpected values so release builds get initialized while debug builds still
catch the anomaly; reference symbols to update: sigma, constraint_lower,
constraint_upper, slack_lower, slack_upper, lp.rhs in pseudo_costs.cpp.
- Around line 1272-1287: This large commented-out logging block around rb_mode
and log.printf (referencing num_candidates, ds_optimal, ds_infeasible,
ds_failed, ds_skipped, ds_elapsed) should not remain in place as-is: either
remove it entirely, or restore it by uncommenting and fixing formatting so the
if (rb_mode != 2) { ... } branch compiles and uses the existing variables, or
replace the block with a single concise TODO comment that references a
tracker/issue ID; pick one approach and apply it consistently so the file no
longer contains the large commented-out printf block.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1a11857e-6bdc-4a3c-a045-386b504556d7

📥 Commits

Reviewing files that changed from the base of the PR and between b0061e4 and f504a75.

📒 Files selected for processing (15)
  • cpp/include/cuopt/linear_programming/constants.h
  • cpp/include/cuopt/linear_programming/mip/solver_settings.hpp
  • cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
  • cpp/src/branch_and_bound/pseudo_costs.cpp
  • cpp/src/branch_and_bound/shared_strong_branching_context.hpp
  • cpp/src/dual_simplex/simplex_solver_settings.hpp
  • cpp/src/math_optimization/solver_settings.cu
  • cpp/src/pdlp/pdlp.cu
  • cpp/src/pdlp/pdlp_constants.hpp
  • cpp/src/pdlp/solve.cu
  • cpp/src/pdlp/solver_settings.cu
  • cpp/src/pdlp/termination_strategy/termination_strategy.cu
  • cpp/src/pdlp/utilities/ping_pong_graph.cu
  • cpp/tests/linear_programming/pdlp_test.cu
  • python/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py
💤 Files with no reviewable changes (1)
  • cpp/src/pdlp/pdlp_constants.hpp
✅ Files skipped from review due to trivial changes (1)
  • cpp/src/pdlp/utilities/ping_pong_graph.cu
🚧 Files skipped from review as they are similar to previous changes (6)
  • cpp/include/cuopt/linear_programming/constants.h
  • cpp/src/dual_simplex/simplex_solver_settings.hpp
  • cpp/src/math_optimization/solver_settings.cu
  • python/cuopt_server/cuopt_server/utils/linear_programming/data_definition.py
  • cpp/src/pdlp/solver_settings.cu
  • cpp/src/pdlp/pdlp.cu

Comment on lines +287 to +292
// By default to save memory and speed we don't store and copy each climber's primal and dual
// solutions We only retrieve termination statistics and the objective values
bool generate_batch_primal_dual_solution{false};
// Used to force batch PDLP to solve a subbatch of the problems at a time
// The 0 default value will make the solver use its heuristic to determine the subbatch size
i_t sub_batch_size{0};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Keep batch solution generation opt-out, not opt-in.

With this default at false, generic batch LP solves now return empty primal/dual/reduced-cost buffers unless every caller remembers to opt in. Existing batch tests in cpp/tests/linear_programming/pdlp_test.cu still slice those buffers after solve_lp(..., new_bounds ...), so this silently regresses the non-MIP batch path and can drive helpers like extract_subvector() past a zero-length buffer. The strong/reliability-branching call sites can disable collection explicitly, but the default here needs to preserve the old behavior.

Suggested fix
-  bool generate_batch_primal_dual_solution{false};
+  bool generate_batch_primal_dual_solution{true};
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// By default to save memory and speed we don't store and copy each climber's primal and dual
// solutions We only retrieve termination statistics and the objective values
bool generate_batch_primal_dual_solution{false};
// Used to force batch PDLP to solve a subbatch of the problems at a time
// The 0 default value will make the solver use its heuristic to determine the subbatch size
i_t sub_batch_size{0};
// By default to save memory and speed we don't store and copy each climber's primal and dual
// solutions We only retrieve termination statistics and the objective values
bool generate_batch_primal_dual_solution{true};
// Used to force batch PDLP to solve a subbatch of the problems at a time
// The 0 default value will make the solver use its heuristic to determine the subbatch size
i_t sub_batch_size{0};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp` around lines
287 - 292, The default for generate_batch_primal_dual_solution is currently
false which breaks existing batch LP callers that expect
primal/dual/reduced-cost buffers; change the default for the
generate_batch_primal_dual_solution member in solver_settings.hpp back to true
so batch solves continue to populate those buffers by default, and update the
inline comment to reflect that the default preserves previous behavior (callers
that want to opt-out can still set generate_batch_primal_dual_solution=false or
use the existing strong/reliability-branching call sites to disable collection).

Comment on lines +158 to +172
// Mark the subproblem as solved so that batch PDLP removes it from the batch
if (sb_view.is_valid()) {
// We could not mark as solved nodes hitting iteartion limit in DS
if ((branch == 0 && ds_is_valid_done(ds_status_down[k])) ||
(branch == 1 && ds_is_valid_done(ds_status_up[k]))) {
sb_view.mark_solved(shared_idx);
settings.log.printf(
"[COOP SB] DS thread %d solved variable %d branch %s (shared_idx %d), marking in "
"shared context\n",
thread_id,
j,
branch == 0 ? "down" : "up",
shared_idx);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor: Typo in comment.

Line 160 has "iteartion" instead of "iteration".

-      // We could not mark as solved nodes hitting iteartion limit in DS
+      // We could not mark as solved nodes hitting iteration limit in DS
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Mark the subproblem as solved so that batch PDLP removes it from the batch
if (sb_view.is_valid()) {
// We could not mark as solved nodes hitting iteartion limit in DS
if ((branch == 0 && ds_is_valid_done(ds_status_down[k])) ||
(branch == 1 && ds_is_valid_done(ds_status_up[k]))) {
sb_view.mark_solved(shared_idx);
settings.log.printf(
"[COOP SB] DS thread %d solved variable %d branch %s (shared_idx %d), marking in "
"shared context\n",
thread_id,
j,
branch == 0 ? "down" : "up",
shared_idx);
}
}
// Mark the subproblem as solved so that batch PDLP removes it from the batch
if (sb_view.is_valid()) {
// We could not mark as solved nodes hitting iteration limit in DS
if ((branch == 0 && ds_is_valid_done(ds_status_down[k])) ||
(branch == 1 && ds_is_valid_done(ds_status_up[k]))) {
sb_view.mark_solved(shared_idx);
settings.log.printf(
"[COOP SB] DS thread %d solved variable %d branch %s (shared_idx %d), marking in "
"shared context\n",
thread_id,
j,
branch == 0 ? "down" : "up",
shared_idx);
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/pseudo_costs.cpp` around lines 158 - 172, Fix the
typo in the comment above the DS check: change "iteartion" to "iteration" in the
comment that precedes the conditional where sb_view.is_valid() and the
ds_is_valid_done checks (the block that calls sb_view.mark_solved and logs via
settings.log.printf), so the comment reads "We could not mark as solved nodes
hitting iteration limit in DS".

Comment on lines +37 to +43
bool is_solved(i_t local_idx) const { return solved[local_idx].load() != 0; }

void mark_solved(i_t local_idx) const { solved[local_idx].store(1); }

shared_strong_branching_context_view_t subview(i_t offset, i_t count) const
{
return {solved.subspan(offset, count)};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the header file in question
find . -name "shared_strong_branching_context.hpp" -type f

Repository: NVIDIA/cuopt

Length of output: 118


🏁 Script executed:

# Examine the full context of the shared_strong_branching_context.hpp file
cat -n cpp/src/branch_and_bound/shared_strong_branching_context.hpp

Repository: NVIDIA/cuopt

Length of output: 1636


🏁 Script executed:

# Search for usages of subview, is_solved, and mark_solved in solve.cu
rg "subview|is_solved|mark_solved" cpp/src/pdlp/solve.cu -A 3 -B 3

Repository: NVIDIA/cuopt

Length of output: 347


🏁 Script executed:

# Check how the shared_strong_branching_context is initialized and used
rg "shared_strong_branching_context" cpp/src/pdlp/solve.cu -A 2 -B 2

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Get broader context around the subview call in solve.cu
rg "shared_sb_view.subview" cpp/src/pdlp/solve.cu -B 10 -A 5

Repository: NVIDIA/cuopt

Length of output: 744


🏁 Script executed:

# Look for the loop that iterates through batches and understand i and current_batch_size
rg "for.*i.*current_batch_size|while.*current_batch_size" cpp/src/pdlp/solve.cu -B 5 -A 10

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Search for any initialization or setup of shared_sb_view to understand its size
rg "shared_sb_view\s*=" cpp/src/pdlp/solve.cu -B 5 -A 3

Repository: NVIDIA/cuopt

Length of output: 484


🏁 Script executed:

# Check if there's any size validation before the batch loop
rg "solve.cu" -l | head -1 | xargs wc -l

Repository: NVIDIA/cuopt

Length of output: 102


🏁 Script executed:

# Search for where run_batch_pdlp is called and how shared_sb_view is initialized
rg "run_batch_pdlp|shared_strong_branching_context_t" cpp/src/pdlp/ -B 3 -A 3 | head -50

Repository: NVIDIA/cuopt

Length of output: 849


🏁 Script executed:

# Check how settings.shared_sb_view is created/passed in
rg "shared_sb_view\s*=" cpp/src/pdlp/ --type cpp --type cu -B 2 -A 2

Repository: NVIDIA/cuopt

Length of output: 82


🏁 Script executed:

# Look for how num_subproblems and max_batch_size are related
rg "num_subproblems|max_batch_size" cpp/src/pdlp/solve.cu -B 2 -A 2

Repository: NVIDIA/cuopt

Length of output: 2609


🏁 Script executed:

# Search for where shared_strong_branching_context_t or shared_sb_view is created/initialized
rg "shared_strong_branching_context_t|shared_sb_view" cpp/src/pdlp/ --type cpp -B 5 -A 5 | head -80

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Look at the full solve.cu function signature and understand the settings parameter
rg -A 30 "template.*run_batch_pdlp" cpp/src/pdlp/solve.cu | head -40

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Check if there are any asserts or validations about shared_sb_view size before the loop
rg "shared_sb_view" cpp/src/pdlp/solve.cu -B 20 | grep -A 20 "run_batch_pdlp"

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Let's search more broadly in the entire codebase for shared_strong_branching_context
find . -type f \( -name "*.cpp" -o -name "*.hpp" -o -name "*.cu" -o -name "*.cuh" \) -exec grep -l "shared_strong_branching_context" {} \;

Repository: NVIDIA/cuopt

Length of output: 270


🏁 Script executed:

# Check the test file to understand how this structure is used
rg "shared_strong_branching_context" cpp/tests/ -B 3 -A 3

Repository: NVIDIA/cuopt

Length of output: 5269


🏁 Script executed:

# Get the signature and first part of run_batch_pdlp to understand all parameters
sed -n '1,100p' cpp/src/pdlp/solve.cu | rg "run_batch_pdlp" -B 5 -A 25

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Look for any validation of shared_sb_view before it's used
rg "is_valid|shared_sb_view" cpp/src/pdlp/solve.cu -B 2 -A 2

Repository: NVIDIA/cuopt

Length of output: 297


🏁 Script executed:

# Verify the exact loop bounds and how max_batch_size relates to context initialization
sed -n '1,200p' cpp/src/pdlp/solve.cu | grep -A 50 "for (size_t i = 0; i < max_batch_size"

Repository: NVIDIA/cuopt

Length of output: 38


🏁 Script executed:

# Check solver_settings.hpp to see how shared_sb_view is defined and documented
rg "shared_sb_view" cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp -B 3 -A 3

Repository: NVIDIA/cuopt

Length of output: 496


🏁 Script executed:

# Check if there's any documentation or assertion in run_batch_pdlp about shared_sb_view size requirement
sed -n '1,300p' cpp/src/pdlp/solve.cu | tail -100

Repository: NVIDIA/cuopt

Length of output: 7391


🏁 Script executed:

# Search for any size validation of shared_sb_view in the entire codebase
rg "shared_sb_view.*size|solved.*size" cpp/ -B 2 -A 2

Repository: NVIDIA/cuopt

Length of output: 2476


🏁 Script executed:

# Look at how the batch_pdlp_solve function calls run_batch_pdlp to see if there's validation there
rg "batch_pdlp_solve|run_batch_pdlp" cpp/src/pdlp/solve.cu -B 5 -A 15 | head -60

Repository: NVIDIA/cuopt

Length of output: 2350


🏁 Script executed:

# Look for any assertions or documentation about the size relationship
rg "shared_sb_view|max_batch_size" cpp/src/pdlp/solve.cu | grep -i "assert\|check\|expect\|size"

Repository: NVIDIA/cuopt

Length of output: 1388


🏁 Script executed:

# Check pseudo_costs.cpp to see if there are any additional clues about expected usage
rg "shared_strong_branching_context" cpp/src/branch_and_bound/pseudo_costs.cpp -B 5 -A 5

Repository: NVIDIA/cuopt

Length of output: 2294


🏁 Script executed:

# Verify std::span behavior by checking if there's any documentation in the code
rg "span.subspan|span\[" cpp/src/branch_and_bound/shared_strong_branching_context.hpp -B 2 -A 2

Repository: NVIDIA/cuopt

Length of output: 38


Add bounds validation before accessing the shared context view.

is_solved, mark_solved, and subview directly index/subspan the shared coordination state without runtime bounds checks. The is_valid() guard only detects empty spans, not size mismatches. When run_batch_pdlp() iterates with indices derived from max_batch_size = settings.new_bounds.size(), it must validate that settings.shared_sb_view.solved.size() >= max_batch_size before the batch loop begins. Otherwise, mismatched context sizes will cause out-of-bounds access and corrupt coordination state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/shared_strong_branching_context.hpp` around lines 37
- 43, Add runtime bounds validation before any access to the shared coordination
state: ensure that settings.shared_sb_view.solved.size() (or the span size
exposed by shared_strong_branching_context_view_t) is >= max_batch_size (where
max_batch_size = settings.new_bounds.size()) before entering run_batch_pdlp()'s
batch loop; also add guard checks in the methods is_solved(i_t),
mark_solved(i_t) and subview(offset,count) to validate their index/offset+count
against the underlying solved span size and handle/report invalid accesses
(e.g., return false/log/throw) to prevent out-of-bounds access and corrupted
coordination state.

Comment on lines +961 to 965
size_t optimal_batch_size = use_optimal_batch_size
? detail::optimal_batch_size_handler(problem, memory_max_batch_size)
: max_batch_size;
if (settings.sub_batch_size > 0) { optimal_batch_size = settings.sub_batch_size; }
cuopt_assert(optimal_batch_size != 0 && optimal_batch_size <= max_batch_size,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Revalidate manual sub-batch overrides against the fitted ceiling.

settings.sub_batch_size overwrites the memory-checked value unconditionally. If the full batch does not fit and the caller forces a larger sub-batch than memory_max_batch_size, this path can still overcommit GPU memory and undo the protection above.

Suggested fix
   size_t optimal_batch_size = use_optimal_batch_size
                                 ? detail::optimal_batch_size_handler(problem, memory_max_batch_size)
                                 : max_batch_size;
-  if (settings.sub_batch_size > 0) { optimal_batch_size = settings.sub_batch_size; }
+  if (settings.sub_batch_size > 0) {
+    const auto requested_sub_batch_size = static_cast<size_t>(settings.sub_batch_size);
+    cuopt_assert(requested_sub_batch_size <= memory_max_batch_size,
+                 "sub_batch_size exceeds the memory-checked batch size");
+    optimal_batch_size = requested_sub_batch_size;
+  }
As per coding guidelines, "Verify correct problem size checks before expensive GPU/CPU operations; prevent resource exhaustion on oversized problems".
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
size_t optimal_batch_size = use_optimal_batch_size
? detail::optimal_batch_size_handler(problem, memory_max_batch_size)
: max_batch_size;
if (settings.sub_batch_size > 0) { optimal_batch_size = settings.sub_batch_size; }
cuopt_assert(optimal_batch_size != 0 && optimal_batch_size <= max_batch_size,
size_t optimal_batch_size = use_optimal_batch_size
? detail::optimal_batch_size_handler(problem, memory_max_batch_size)
: max_batch_size;
if (settings.sub_batch_size > 0) {
const auto requested_sub_batch_size = static_cast<size_t>(settings.sub_batch_size);
cuopt_assert(requested_sub_batch_size <= memory_max_batch_size,
"sub_batch_size exceeds the memory-checked batch size");
optimal_batch_size = requested_sub_batch_size;
}
cuopt_assert(optimal_batch_size != 0 && optimal_batch_size <= max_batch_size,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/solve.cu` around lines 961 - 965, The code allows
settings.sub_batch_size to unconditionally overwrite optimal_batch_size and can
exceed the memory-checked ceiling (memory_max_batch_size), risking GPU
overcommit; update the logic around optimal_batch_size (used with
detail::optimal_batch_size_handler and compared to max_batch_size) to revalidate
or clamp settings.sub_batch_size against the computed ceiling
(memory_max_batch_size or max_batch_size) before assigning, and adjust the
subsequent cuopt_assert to reflect the validated/clamped value so an oversized
manual override cannot bypass the memory check.

Comment on lines +2244 to +2255
auto pdlp_thread = std::thread([&]() {
auto sol = new optimization_problem_solution_t<int, double>(
solve_lp(&handle_, op_problem, solver_settings));
result_ptr = sol;
});

// Wait a bit then mark entries 0, 2, 4 as solved (simulating DS)
std::this_thread::sleep_for(std::chrono::milliseconds(200));
for (int i = 0; i < n_fractional; ++i)
ctx.solved[i].store(1);

pdlp_thread.join();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace the fixed 200 ms sleep with deterministic synchronization.

On an afiro-sized batch, PDLP can finish before the delay. That makes shared_sb_view_concurrent_mark pass without ever exercising the ConcurrentLimit path, and it makes shared_sb_view_all_infeasible flaky because it hard-requires the external mark to win the race. Please gate these tests with a latch/barrier or another deterministic hook instead of wall-clock timing.

Also applies to: 2310-2319

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/linear_programming/pdlp_test.cu` around lines 2244 - 2255, The test
uses a fixed sleep (std::this_thread::sleep_for(... 200ms)) to wait for PDLP to
start, which is flaky; replace this timing-based wait with deterministic
synchronization: add a latch/condition (e.g., std::promise/std::future or
std::condition_variable + mutex) or a std::barrier that the PDLP thread signals
when it has reached the point where external marks should be applied, then have
the main test wait on that synchronization primitive before setting
ctx.solved[i].store(1); update the lambda that creates pdlp_thread (pdlp_thread,
solve_lp, and the PDLP entry point) to signal the latch at the proper startup
point so shared_sb_view_concurrent_mark and shared_sb_view_all_infeasible
deterministically exercise the ConcurrentLimit race path.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
cpp/include/cuopt/linear_programming/mip/solver_settings.hpp (1)

101-104: Consider replacing 0/1/2 mode literals with shared named constants (or enum).

Using a shared enum/constexpr for these mode values would reduce semantic drift between comments, registration bounds, and consumers.

♻️ Example refactor (local to this header)
+  enum class batch_pdlp_mode_t : i_t {
+    dual_simplex_only = 0,
+    cooperative       = 1,
+    batch_pdlp_only   = 2
+  };
+
-  i_t mip_batch_pdlp_strong_branching{
-    1};  // 0 = DS only, 1 = cooperative DS + PDLP, 2 = batch PDLP only
-  i_t mip_batch_pdlp_reliability_branching{
-    1};  // 0 = DS only, 1 = cooperative DS + PDLP, 2 = batch PDLP only
+  i_t mip_batch_pdlp_strong_branching{
+    static_cast<i_t>(batch_pdlp_mode_t::cooperative)};
+  i_t mip_batch_pdlp_reliability_branching{
+    static_cast<i_t>(batch_pdlp_mode_t::cooperative)};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/cuopt/linear_programming/mip/solver_settings.hpp` around lines
101 - 104, Replace the numeric mode literals for mip_batch_pdlp_strong_branching
and mip_batch_pdlp_reliability_branching with a shared named enum or constexpr
constants (e.g., Mode::DS_ONLY, Mode::COOPERATIVE_DS_PDLP,
Mode::BATCH_PDLP_ONLY) declared in this header; update the variable
types/initializers to use those named constants, update the comments to refer to
the enum names, and ensure any registration/validation logic that currently
checks 0/1/2 uses the enum values so bounds and consumers remain consistent with
the new symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@cpp/include/cuopt/linear_programming/mip/solver_settings.hpp`:
- Around line 101-104: Replace the numeric mode literals for
mip_batch_pdlp_strong_branching and mip_batch_pdlp_reliability_branching with a
shared named enum or constexpr constants (e.g., Mode::DS_ONLY,
Mode::COOPERATIVE_DS_PDLP, Mode::BATCH_PDLP_ONLY) declared in this header;
update the variable types/initializers to use those named constants, update the
comments to refer to the enum names, and ensure any registration/validation
logic that currently checks 0/1/2 uses the enum values so bounds and consumers
remain consistent with the new symbols.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ea0863f1-03fa-448e-a445-e8836db6e514

📥 Commits

Reviewing files that changed from the base of the PR and between f504a75 and f456c71.

📒 Files selected for processing (5)
  • cpp/include/cuopt/linear_programming/constants.h
  • cpp/include/cuopt/linear_programming/mip/solver_settings.hpp
  • cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
  • cpp/src/math_optimization/solver_settings.cu
  • cpp/src/pdlp/solve.cu
✅ Files skipped from review due to trivial changes (1)
  • cpp/src/pdlp/solve.cu
🚧 Files skipped from review as they are similar to previous changes (3)
  • cpp/include/cuopt/linear_programming/constants.h
  • cpp/src/math_optimization/solver_settings.cu
  • cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality mip non-breaking Introduces a non-breaking change pdlp

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants