Skip to content

Incorporate determinism into GPU heuristics + parallel determinsitic B&B+GPU exploration#986

Open
aliceb-nv wants to merge 294 commits intoNVIDIA:release/26.04from
aliceb-nv:gpudet
Open

Incorporate determinism into GPU heuristics + parallel determinsitic B&B+GPU exploration#986
aliceb-nv wants to merge 294 commits intoNVIDIA:release/26.04from
aliceb-nv:gpudet

Conversation

@aliceb-nv
Copy link
Copy Markdown
Contributor

@aliceb-nv aliceb-nv commented Mar 23, 2026

This PR implements the integration of GPU heuristics into the deterministic codepath of the solver. This enables running full end-to-end solves with presolve, parallel B&B, and GPU heuristics, with full run-to-run determinism and heterogeneous (CPU+GPU) execution.

For this purpose, the GPU heuristics were modified to eliminate sources of nondeterminism such as non-ordered floating point reductions, atomic operations, scheduling-dependent operations... Extensive testing has been added to the CI runs to ensure individual components (such as probing, FJ, recombiners, FP...) are bitwise deterministic, and their overall orchestration maintains this invariant.
Furthermore, to reduce changes to the codebase and splits in the codepath, wall-clock timers on the heuristics side were replaced with unified termination checking objects, that check for wall time (opportunistic mode), or work total (deterministic mode).
These timers are hierarchical, and thus allow for checking against multiple terminations sources, for example a local algo's work budget, the global wall clock time limit, or in future PRs user-controlled termination signals like Ctrl-C or callback handlers.

B&B acts as the authority for incumbent publishing to the user. At every horizon sync, the queue of GPU heuristic solutions is checked, and if solutions whose timestamp is <= the current B&B work total, they are drained, replayed, and incorporated into the tree exploration.

A new get_solution_callback_ext callback has been introduced to prevent ABI breaks and pass more metadata with each published solution. For now, only the solution origin and its work unit timestamp are exposed, but additional fields may be later added without breaking the ABI.
The benchmark runner now relies on such callbacks to record the incumbents and their timestamps - it is no longer necessary to parse the logs to compute the primal integral.

A few other bugfixes have been included to the existing B&B deterministic exploration regarding node queues. Other fixes include uninitialized memory accesses, and improvements to allow running compute-sanitizer initcheck without false positives. Determinism logs have also been added thorought the code, disabled by default.

Some features remain unsupported in deterministic mode, such as Papilo's Probing presolver, concurrent root solve, RINS, or the sub-MIP recombiner. These will be incorporated in later PRs.
Furthermore, the PR has been modified to take the ML estimators out for the sake of reducing code review workload. They will be incorporated again in later PRs, along with improved tuning.
No intentional API or ABI breaking changes introduced.

Opportunistic mode performance remains unchanged:
Primal gap 12.5%, 226 feasible, primal integral 19.1%.

Deterministic mode benchmarks pending.

Closes #144
Closes #882

@aliceb-nv aliceb-nv added this to the 26.04 milestone Mar 23, 2026
@aliceb-nv aliceb-nv requested review from a team as code owners March 23, 2026 11:13
@aliceb-nv aliceb-nv added the non-breaking Introduces a non-breaking change label Mar 23, 2026
@aliceb-nv aliceb-nv requested a review from gforsyth March 23, 2026 11:13
@aliceb-nv aliceb-nv added the improvement Improves an existing functionality label Mar 23, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@aliceb-nv
Copy link
Copy Markdown
Contributor Author

/ok to test 1678640

@aliceb-nv
Copy link
Copy Markdown
Contributor Author

/ok to test dc8f073

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 23, 2026

📝 Walkthrough

Walkthrough

Added deterministic MIP execution and work-unit accounting across the solver: new solution-origin metadata and extended incumbent callbacks, a unified termination/work-limit timer, work-context plumbing, deterministic logging hooks, many timer/type conversions to work_limit_timer_t, and accompanying tests and CI suppressions.

Changes

Cohort / File(s) Summary
Public API & Constants
cpp/include/cuopt/linear_programming/constants.h, cpp/include/cuopt/linear_programming/cuopt_c.h
Introduced bitmask determinism flags, new determinism submodes, MIP solution-origin constants, cuOptMIPSolutionCallbackInfo struct, extended C ABI callback type cuOptMIPGetSolutionCallbackExt, and setter cuOptSetMIPGetSolutionCallbackExt.
Solver Settings & Context
cpp/include/.../mip/solver_settings.hpp, cpp/src/mip_heuristics/solver.cuh, cpp/src/mip_heuristics/solver_context.cuh, cpp/src/mip_heuristics/solver.cu
Added per-component work-unit scales and gpu_heur_wait flag; switched determinism default to bitmask; replaced legacy solution callback with meta-aware new_incumbent_callback; added termination-checker pointer and validation.
Termination / Work Infrastructure
cpp/src/utilities/termination_checker.hpp, cpp/src/utilities/work_limit_timer.hpp, cpp/src/utilities/work_limit_context.hpp, cpp/src/utilities/work_unit_scheduler.*
Added termination_checker_t (work-unit + wall-clock parent-aware), alias work_limit_timer_t, producer sync and scaling in work context, and scheduler base-offset support.
Callback / Publication & Injection
cpp/src/mip_heuristics/solution_callbacks.cuh, cpp/src/mip_heuristics/solution/solution.*, cpp/src/mip_heuristics/problem/problem.*
New centralized solution publication/injection utilities that build callback payloads, handle scaling/uncrushing/postprocessing, add solution.get_hash(), and extend problem/postprocess APIs with optional handle override and origin-aware branch-and-bound callback.
Branch-and-Bound & Deterministic Replay
cpp/src/branch_and_bound/branch_and_bound.*, cpp/src/branch_and_bound/deterministic_workers.hpp, cpp/src/branch_and_bound/mip_node.hpp, cpp/src/branch_and_bound/pseudo_costs.*
Major deterministic rework: origin-tagged incumbents, emit_solution_callback helpers, queued external-solution retirement, unified replay stream, work-unit timeline coordination, worker snapshot changes, move-enabled node teardown, and strong-branching work-context plumbing.
Dual Simplex & Relaxed LP
cpp/src/dual_simplex/*, cpp/src/mip_heuristics/relaxed_lp/*
Adjusted work-estimate math edge cases; replaced simple callback with metadata-enabled incumbent callback; relaxed-LP settings now support iteration/work_limit and work_context; determinism-aware LP configuration and work recording.
Heuristics: Diversity, Recombiners, Population
cpp/src/mip_heuristics/diversity/*, cpp/src/mip_heuristics/diversity/recombiners/*, cpp/src/mip_heuristics/diversity/population.*
Propagated solution-origin through population APIs; timers → work_limit_timer_t; recombiners now return (solution,bool,work) and report work; many determinism logs and gating (GPU heuristics, exploration wait).
Feasibility Jump & Local Search
cpp/src/mip_heuristics/feasibility_jump/*, cpp/src/mip_heuristics/local_search/*, cpp/src/mip_heuristics/local_search/rounding/*
Added per-variable deterministic work estimates, batch work accounting, deterministic tie-breaking/sorting, timer→work_limit_timer_t conversions, iteration/work limits, and determinism logging across FJ, FP, and rounding/repair flows.
Presolve & Probing
cpp/src/mip_heuristics/presolve/*, cpp/src/mip_heuristics/presolve/third_party_presolve.*
Converted presolve timer paths to work_limit_timer_t&, added WORK_LIMIT termination enum, made Papilo probing conditional on determinism, and added presolve determinism setter.
Utilities & Support
cpp/src/utilities/*, cpp/src/utilities/determinism_log.hpp, cpp/src/utilities/cuda_helpers.cuh, cpp/src/utilities/copy_helpers.hpp, cpp/src/utilities/seed_generator.*
Added no-op determinism log macro, NVTX memory-initialization helpers, device-scalar span helpers, and thread-local seed generator epoching.
Benchmarks, Tests & Build
benchmarks/linear_programming/cuopt/run_mip.cpp, cpp/tests/*, cpp/CMakeLists.txt, cpp/tests/CMakeLists.txt, cpp/tests/mip/CMakeLists.txt
Added incumbent-trace CSV tracking in benchmark runner; extensive determinism-focused tests and utilities; CMake tweaks (PAPI block uncommented, CUPTI helper); enabled/added several test targets.
CI Suppressions & Datasets
ci/compute-sanitizer-suppressions.xml, datasets/mip/download_miplib_test_dataset.sh
Added ComputeSanitizer suppression rules for known Initcheck findings and added supportcase42 to test dataset list.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
cpp/src/mip_heuristics/presolve/lb_probing_cache.cu (1)

279-286: ⚠️ Potential issue | 🟡 Minor

Missing CUDA error check after kernel launch.

The compute_min_slack_per_var kernel launch lacks error verification. As per coding guidelines, every CUDA kernel launch should have error checking with CUDA_CHECK or equivalent.

🛡️ Proposed fix
   compute_min_slack_per_var<i_t, f_t>
     <<<problem.pb->n_integer_vars, 128, 0, problem.handle_ptr->get_stream()>>>(
       problem.pb->view(),
       make_span_2(bound_presolve.cnst_slack),
       make_span(min_slack_per_var),
       make_span(different_coefficient),
       make_span(max_excess_per_var),
       make_span(max_n_violated_per_constraint));
+  RAFT_CHECK_CUDA(problem.handle_ptr->get_stream());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/presolve/lb_probing_cache.cu` around lines 279 - 286,
The kernel launch of compute_min_slack_per_var currently has no CUDA error
checking; after the launch (which uses problem.handle_ptr->get_stream()), add a
CUDA error check call (e.g., CUDA_CHECK(cudaGetLastError())) to capture launch
errors and, if the code expects the kernel to be completed before continuing,
also add CUDA_CHECK(cudaDeviceSynchronize()) or the stream-synchronized
equivalent; place these checks immediately after the
compute_min_slack_per_var<<<...>>>(...) invocation to ensure any launch or
runtime errors are reported.
cpp/src/mip_heuristics/solution/solution.cu (1)

108-124: ⚠️ Potential issue | 🟠 Major

Don't suppress initcheck and then copy the same buffers.

Lines 108-115 mark other_sol's device buffers as initialized, but Lines 117-124 still copy those same buffers into *this. If those fields are actually deferred/uninitialized, this turns a sanitizer hit into silent garbage propagation and can leak stale excess/objective state into later computations. Either skip these copies or initialize/recompute the destination buffers first. As per coding guidelines "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/solution/solution.cu` around lines 108 - 124, The code
currently calls cuopt::mark_span_as_initialized on other_sol.lower_excess /
upper_excess / constraint_value / obj_val / n_feasible_constraints and then
immediately copies from those same buffers into this via expand_device_copy and
raft::copy, which can hide uninitialized-source problems; fix by either (A)
removing the premature cuopt::mark_span_as_initialized calls and only mark the
destination spans after you legitimately initialize/compute or successfully copy
validated data, or (B) ensure the source other_sol fields are actually
initialized/recomputed before marking (e.g., run the routine that populates
other_sol.lower_excess/upper_excess/constraint_value/obj_val/n_feasible_constraints)
and only then call cuopt::mark_span_as_initialized and proceed with
expand_device_copy / raft::copy; locate these changes around the uses of
cuopt::mark_span_as_initialized, expand_device_copy, and raft::copy for the
symbols other_sol.lower_excess, other_sol.upper_excess,
other_sol.constraint_value, other_sol.obj_val, and
other_sol.n_feasible_constraints.
cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu (1)

704-730: ⚠️ Potential issue | 🔴 Critical

Update function implementations to accept work_limit_timer_t& instead of timer_t&.

The header declarations for find_integer() (line 38-44) and run_repair_procedure() (lines 83-87) in lb_constraint_prop.cuh specify work_limit_timer_t& timer, but their implementations in lb_constraint_prop.cu (lines 753 and 453 respectively) declare the parameter as timer_t& timer. This signature mismatch causes work-based termination semantics to be lost:

  • apply_round() now correctly accepts work_limit_timer_t& (line 704)
  • Calls to find_integer() and run_repair_procedure() pass the work_limit_timer_t reference
  • However, implementations receive only timer_t, which tracks wall-clock time only and lacks the work-limit context
  • In deterministic mode (work-limited execution), this breaks the work budget enforcement end-to-end

Update implementations to match declarations: change parameter type from timer_t& timer to work_limit_timer_t& timer in both methods.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu` around
lines 704 - 730, The implementations of find_integer(...) and
run_repair_procedure(...) must use the same timer type as their declarations:
change their parameter types from timer_t& timer to work_limit_timer_t& timer so
the work-limit semantics propagate (e.g., update the function signatures and any
internal uses of timer in the find_integer and run_repair_procedure definitions
to accept and forward a work_limit_timer_t&), ensuring calls from apply_round
and other callers pass the correct work_limit_timer_t reference without altering
call sites.
cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuh (1)

134-160: ⚠️ Potential issue | 🟠 Major

Reserve the sentinel and clamp score conversions.

fj_staged_score_t now stores scores in int32_t, but the write sites still derive them from rounded floating-point aggregates. Without a checked conversion, large FJ penalties can overflow into the sentinel range and silently corrupt move ordering. Please keep INT32_MIN reserved for invalid() and clamp/assert at the conversion point.

Possible direction
 struct fj_staged_score_t {
   int32_t base{std::numeric_limits<int32_t>::lowest()};
   int32_t bonus{std::numeric_limits<int32_t>::lowest()};
+
+  HDI static int32_t clamp_score(double v)
+  {
+    cuopt_assert(std::isfinite(v), "score should be finite");
+    constexpr double lo =
+      static_cast<double>(std::numeric_limits<int32_t>::lowest()) + 1.0;
+    constexpr double hi = static_cast<double>(std::numeric_limits<int32_t>::max());
+    return static_cast<int32_t>(max(lo, min(hi, std::round(v))));
+  }

Use clamp_score(...) at the score write sites in compute_new_score.

As per coding guidelines, "Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuh` around lines
134 - 160, The fj_staged_score_t type reserves INT32_MIN as the invalid sentinel
but scores are still produced by rounding floating aggregates in
compute_new_score; modify the conversion sites in compute_new_score to use a
clamp_score(...) helper (or add one) that clamps rounded values into
[INT32_MIN+1, INT32_MAX] and asserts or handles any out-of-range/NaN inputs
before constructing fj_staged_score_t, so no valid score can equal INT32_MIN and
invalid() remains unique; update any direct fj_staged_score_t(...) constructions
in compute_new_score to use the clamped value and add a short comment
referencing fj_staged_score_t::invalid().
cpp/src/mip_heuristics/diversity/diversity_manager.cu (1)

126-143: ⚠️ Potential issue | 🔴 Critical

Initialize context.gpu_heur_loop.deterministic before the env-gated returns.

The common path with no CUOPT_CONFIG_ID hits the early return first, so the assignment at Lines 142-143 never runs. Every later work_limit_timer_t built on context.gpu_heur_loop then behaves opportunistically, which silently disables deterministic work accounting for the default case.

💡 Minimal fix
   CUOPT_DETERMINISM_LOG(
     "Deterministic solve start diversity state: seed_state=%lld fp_max=%zu "
     "ls_max=%zu bp_max=%zu sub_mip_max=%zu last_lm=%d last_ls=%d "
     "enabled_recombiners=%zu",
     (long long)cuopt::seed_generator::peek_seed(),
     fp_recombiner_config_t::max_n_of_vars_from_other,
     ls_recombiner_config_t::max_n_of_vars_from_other,
     bp_recombiner_config_t::max_n_of_vars_from_other,
     sub_mip_recombiner_config_t::max_n_of_vars_from_other,
     (int)mab_ls_config_t<i_t, f_t>::last_lm_config,
     (int)mab_ls_config_t<i_t, f_t>::last_ls_mab_option,
     recombiner_t<i_t, f_t>::enabled_recombiners.size());

+  context.gpu_heur_loop.deterministic =
+    (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS);
+
   int max_config             = -1;
   int env_config_id          = -1;
   const char* env_max_config = std::getenv("CUOPT_MAX_CONFIG");
   ...
-  context.gpu_heur_loop.deterministic =
-    (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu` around lines 126 -
143, The assignment to context.gpu_heur_loop.deterministic should be moved so it
always runs before any early returns from parsing CUOPT_CONFIG_ID; initialize
context.gpu_heur_loop.deterministic using (context.settings.determinism_mode &
CUOPT_DETERMINISM_GPU_HEURISTICS) immediately after reading/inspecting
env_config_id_raw (or even before parsing), then proceed with parsing
env_config_id_raw, exception handling, and the max_config check so later code
that constructs work_limit_timer_t sees the correct deterministic flag.
cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu (1)

728-735: ⚠️ Potential issue | 🟠 Major

Don’t hard-sync this hot path just for logging.

update_host_assignment() is called inside the main rounding loop, and the new sync_stream() now serializes every iteration even when determinism logging is compiled out. Please gate the sync/hash behind deterministic logging (or the deterministic mode flag) so opportunistic runs stay asynchronous.

As per coding guidelines, "Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu` around lines
728 - 735, The code in update_host_assignment copies device data back to host
then unconditionally calls sol.handle_ptr->sync_stream() and compute_hash, which
forces synchronous GPU execution every iteration; change this so the stream sync
and compute_hash call are only executed when deterministic logging is enabled
(i.e., inside the CUOPT_DETERMINISM_LOG path or guarded by the
deterministic-mode flag). Specifically, keep the raft::copy to host
asynchronous, and move or wrap sol.handle_ptr->sync_stream() and the call to
detail::compute_hash(...) (and the CUOPT_DETERMINISM_LOG invocation) inside a
conditional that checks deterministic logging/flag so normal runs remain
asynchronous. Ensure you reference update_host_assignment,
sol.handle_ptr->sync_stream(), detail::compute_hash(make_span(sol.assignment),
...), and CUOPT_DETERMINISM_LOG when making the change.
cpp/src/branch_and_bound/branch_and_bound.cpp (1)

732-755: ⚠️ Potential issue | 🟠 Major

Publish incumbent callbacks after releasing mutex_upper_.

Both blocks call emit_solution_callback_from_crushed(...) while holding mutex_upper_. new_incumbent_callback is external/user-provided code, so keeping the upper-bound lock across it risks deadlocks and stalls competing incumbent updates. Capture the payload under the lock, unlock, then publish.

Also applies to: 963-989

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 732 - 755, The
incumbent callback (emit_solution_callback_from_crushed) is being invoked while
holding mutex_upper_, risking deadlocks because the callback is external; change
both places (the block around mutex_upper_ here and the similar block at the
other occurrence) to capture all necessary payload under the lock — e.g., read
previous_upper, set upper_bound_ and call incumbent_.set_incumbent_solution,
compute any hashes or report_heuristic inputs, and move
emit_solution_callback_from_crushed invocation out of the critical section;
unlock mutex_upper_ before calling emit_solution_callback_from_crushed so the
lock is not held during the user callback.
🟠 Major comments (20)
cpp/src/mip_heuristics/diversity/recombiners/recombiner_stats.hpp-78-84 (1)

78-84: ⚠️ Potential issue | 🟠 Major

Reset the new cached recombiner metrics too.

These fields are initialized once, but reset() does not clear them. Reusing all_recombine_stats after a reset will keep reporting the previous attempt’s time/work until another recombiner runs.

🩹 Suggested change outside this hunk
   void reset()
   {
     for (size_t i = 0; i < recombiner_count; ++i) {
       stats[i].reset();
     }
     last_attempt.reset();
+    last_recombiner_time = 0.0;
+    last_recombiner_work = 0.0;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/recombiners/recombiner_stats.hpp` around
lines 78 - 84, The reset() method in recombiner_stats.hpp must also clear the
cached recombiner metrics so past values aren't reused: inside reset() set
last_recombiner_time to 0.0, reset last_recombiner_start_time (e.g., to a
default-constructed std::chrono::high_resolution_clock::time_point), and set
last_recombiner_work to 0.0; update any references to all_recombine_stats to
rely on these cleared values so subsequent uses start from zeroed recombiner
metrics.
cpp/src/mip_heuristics/presolve/bounds_update_data.cu-42-77 (1)

42-77: ⚠️ Potential issue | 🟠 Major

Reset bounds_changed with the rest of the presolve scratch state.

resize() now reinitializes every device buffer except bounds_changed. If the same bounds_update_data_t instance is reused, that scalar can carry the previous pass’s “changed” flag into the next problem/iteration and skew the presolve loop.

🩹 Suggested change
   changed_constraints.resize(problem.n_constraints, problem.handle_ptr->get_stream());
   next_changed_constraints.resize(problem.n_constraints, problem.handle_ptr->get_stream());
   changed_variables.resize(problem.n_variables, problem.handle_ptr->get_stream());
 
+  thrust::fill_n(problem.handle_ptr->get_thrust_policy(), bounds_changed.data(), 1, 0);
   thrust::fill(problem.handle_ptr->get_thrust_policy(),
                min_activity.begin(),
                min_activity.end(),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/presolve/bounds_update_data.cu` around lines 42 - 77,
The scalar bounds_changed in bounds_update_data_t is not reinitialized when
buffers are resized, allowing a reused instance to carry over a previous
"changed" flag; reset it during the same resize/initialization sequence (e.g.,
set bounds_changed = false or 0) after the device buffers are resized/filled
(alongside min_activity, max_activity, lb, ub, changed_constraints,
next_changed_constraints, changed_variables) so each new problem/iteration
starts with bounds_changed cleared.
cpp/src/branch_and_bound/mip_node.hpp-75-76 (1)

75-76: ⚠️ Potential issue | 🟠 Major

Defaulted move semantics are semantically incorrect for this type; while currently safe in practice, replace with custom implementations to maintain the child-to-parent invariant.

The defaulted move constructor and assignment operator do not reparent children: moving a node transfers its children unique_ptrs but leaves each child's parent pointer unchanged, violating the type invariant. Although current call sites (lines 1856, 2909, 4236) only move leaf or detached nodes created via detach_copy(), the default semantics permit unsafe moves of branched nodes if the code is later refactored. Replace the defaulted move operations with custom implementations that reparent all immediate children to the moved-to object.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/mip_node.hpp` around lines 75 - 76, The defaulted
move constructor and move assignment for mip_node_t break the child->parent
invariant because moved children keep their old parent pointer; implement a
custom mip_node_t(mip_node_t&&) and mip_node_t& operator=(mip_node_t&&) that
transfer/move all members and, for each moved child in the moved-from node's
children container, reset the child's parent pointer to the destination object
(this). In operator= also guard against self-assignment, properly clear or
release any existing children of the destination before moving, and ensure other
relevant members (e.g., node data/ids/ownership) are moved consistently so the
tree invariant is preserved.
cpp/tests/mip/feasibility_jump_tests.cu-169-197 (1)

169-197: ⚠️ Potential issue | 🟠 Major

Tighten the determinism check and keep its baseline local.

Lines 192-195 use a static baseline that survives repeated test runs, and Line 196 only asserts the objective within ±1.0. Different incumbents can still pass. For a determinism test, capture the first run inside the test loop and compare exact values on every repeat (e.g. objective plus solution.get_hash() / full assignment). As per coding guidelines "Ensure test isolation: prevent GPU state, cached memory, and global variables from leaking between test cases; verify each test independently initializes its environment" and "Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/feasibility_jump_tests.cu` around lines 169 - 197, The
determinism check in run_fj_check_determinism uses a static first_val_map and a
loose EXPECT_NEAR tolerance allowing different incumbents to pass; change it to
keep the baseline local to the test invocation (remove static std::unordered_map
first_val_map), capture the baseline on the first run of this test invocation by
storing solution.get_user_objective() and solution.get_hash() (or full
assignment) into local variables, then on subsequent repeats assert strict
equality (EXPECT_EQ or EXPECT_DOUBLE_EQ for objective plus EXPECT_EQ for
solution.get_hash()) to ensure exact determinism; reference
run_fj_check_determinism, run_fj_instance, solution.get_user_objective(), and
solution.get_hash() when making the change.
cpp/include/cuopt/linear_programming/constants.h-87-90 (1)

87-90: ⚠️ Potential issue | 🟠 Major

CUOPT_MODE_DETERMINISTIC no longer preserves the legacy numeric value.

After this change, older callers that still send the raw integer 1 will silently get B&B-only determinism instead of full determinism. If that break is intentional, it needs an explicit migration path; otherwise the parameter parser should keep legacy 1 mapped to full determinism.

As per coding guidelines, "Flag API changes that may need corresponding docs/ updates".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/include/cuopt/linear_programming/constants.h` around lines 87 - 90, The
macro CUOPT_MODE_DETERMINISTIC was changed to map to CUOPT_DETERMINISM_FULL but
legacy callers that pass the raw integer 1 will now be interpreted as
CUOPT_DETERMINISM_BB; update the parameter parsing/mapping logic (where integers
are translated to determinism enums) to explicitly map the legacy integer value
1 to CUOPT_DETERMINISM_FULL (or add a clear migration/compatibility path) so
that CUOPT_MODE_DETERMINISTIC preserves legacy behavior; reference the symbols
CUOPT_MODE_DETERMINISTIC, CUOPT_DETERMINISM_FULL, CUOPT_DETERMINISM_BB and
ensure any docs/flags are annotated to indicate this compatibility decision.
cpp/src/mip_heuristics/solve.cu-243-250 (1)

243-250: ⚠️ Potential issue | 🟠 Major

The new bitmask modes need a narrower flag here than deterministic_run.

CUOPT_MODE_DETERMINISTIC_BB still leaves GPU heuristics opportunistic, but passing deterministic_run = true into problem_t makes cpp/src/mip_heuristics/problem/problem.cu disable related-variable construction and suppress heuristic-side adaptive state. That means B&B-only mode now changes heuristic setup as a side effect. Please split this into a deterministic_gpu_heuristics flag for problem_t and keep the broader boolean only where full-solve orchestration really needs it.

Also applies to: 293-294, 337-338

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/solve.cu` around lines 243 - 250, Create and use a
narrower flag for GPU-heuristic determinism instead of passing the broad
deterministic_run into detail::problem_t; compute a new bool (e.g.,
deterministic_gpu_heuristics) based on settings.determinism_mode that is true
only for the modes that require deterministic GPU heuristics (not for
CUOPT_MODE_DETERMINISTIC_BB), pass that flag into the problem_t constructor
(replace the current deterministic_run param there), and leave deterministic_run
unchanged for orchestration logic; apply the same change at the other
occurrences mentioned (the other problem_t constructions around the noted
locations).
cpp/src/mip_heuristics/diversity/recombiners/bound_prop_recombiner.cuh-193-195 (1)

193-195: 🛠️ Refactor suggestion | 🟠 Major

Replace the placeholder recombiner work estimate before relying on work_limit.

n_vars_from_other / 1e8 is deterministic, but it is not meaningfully correlated with the actual bounds-prop/LP effort in this path. That makes the new work-unit scaling hard to reason about and easy to skew across instances.

cpp/src/mip_heuristics/problem/problem.cu-1493-1513 (1)

1493-1513: ⚠️ Potential issue | 🟠 Major

Use stable_sort_by_key or add a secondary sort key to ensure deterministic floating-point accumulation.

thrust::sort_by_key does not guarantee stability for equal keys—elements with the same key may be permuted. When multiple substitutions target the same variable, their deltas can be grouped in an unspecified order, causing the subsequent reduce_by_key to produce non-deterministic floating-point sums. Replace with stable_sort_by_key to preserve insertion order within equal keys, or sort by (key, original_position) to establish a total order.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/problem/problem.cu` around lines 1493 - 1513, The
sort+reduce sequence uses thrust::sort_by_key on obj_coeff_keys/obj_coeff_deltas
which is not stable and can lead to non-deterministic FP accumulation; change
the sort to a stable sort (use thrust::stable_sort_by_key with the same policy
and ranges) or implement a secondary key using the original index (e.g., create
a position array and sort by (obj_coeff_keys, positions)) so that the grouping
fed into thrust::reduce_by_key is deterministic; update the call site where
thrust::sort_by_key is invoked and ensure the same key/value spans
(obj_coeff_keys, obj_coeff_deltas) are passed to the stable sort so the later
reduce_by_key and the lambda that updates objective_coefficients remain correct.
cpp/tests/mip/determinism_test.cu-50-55 (1)

50-55: ⚠️ Potential issue | 🟠 Major

The new callback determinism test still misses work_timestamp capture and non-bitwise floating-point comparisons.

work_timestamp is the key field ordering callbacks for deterministic replay, but the test never captures it from callback_info. Also, EXPECT_EQ on double uses operator== (value equality), so exact IEEE-754 differences—like sign bits distinguishing +0.0 from -0.0—slip through undetected. Extend callback_solution_t to include work_timestamp, capture it during callback invocation (lines 88–93), and update expect_callback_prefixes_bitwise_equal() (lines 139–159) to compare the raw bits of objective, solution_bound, work_timestamp, and assignment entries rather than relying on EXPECT_EQ.

Also applies to: 293–346

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/determinism_test.cu` around lines 50 - 55, Add a work_timestamp
field to struct callback_solution_t and populate it from callback_info when the
callback runs (the same place where assignment, objective and solution_bound are
captured), then change expect_callback_prefixes_bitwise_equal to perform bitwise
comparisons of the raw IEEE-754 representation (not EXPECT_EQ) for objective,
solution_bound, work_timestamp and each element of assignment; use e.g.
reinterpret_cast-to-integer or std::bit_cast on the doubles to compare their bit
patterns so +0.0 vs -0.0 and other non-bitwise-equal floats are detected.
cpp/tests/mip/presolve_test.cu-149-155 (1)

149-155: ⚠️ Potential issue | 🟠 Major

Hash both probe directions' cached-bound maps.

The composite hash only includes a.second[0].var_to_cached_bound_map. If a.second[1] becomes nondeterministic, this test still passes, so it won't catch half of the probing-cache regressions it's meant to guard.

Minimal fix
-    auto sorted_map = std::map<int, detail::cached_bound_t<double>>(
-      a.second[0].var_to_cached_bound_map.begin(), a.second[0].var_to_cached_bound_map.end());
-    for (const auto& [var_id, cached_bound] : sorted_map) {
-      var_to_cached_bound_keys.push_back(var_id);
-      var_to_cached_bound_lb.push_back(cached_bound.lb);
-      var_to_cached_bound_ub.push_back(cached_bound.ub);
-    }
+    for (int dir : {0, 1}) {
+      auto sorted_map = std::map<int, detail::cached_bound_t<double>>(
+        a.second[dir].var_to_cached_bound_map.begin(),
+        a.second[dir].var_to_cached_bound_map.end());
+      for (const auto& [var_id, cached_bound] : sorted_map) {
+        var_to_cached_bound_keys.push_back(var_id);
+        var_to_cached_bound_lb.push_back(cached_bound.lb);
+        var_to_cached_bound_ub.push_back(cached_bound.ub);
+      }
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/presolve_test.cu` around lines 149 - 155, The test only hashes
a.second[0].var_to_cached_bound_map, so it misses nondeterminism in a.second[1];
update the extraction to include both probe directions by iterating over both
indices (0 and 1) of a.second and processing each var_to_cached_bound_map: for
each a.second[i].var_to_cached_bound_map, build a sorted map (or otherwise
deterministically order entries) and append var ids and their cached_bound.lb/ub
into var_to_cached_bound_keys, var_to_cached_bound_lb, and
var_to_cached_bound_ub so both probe directions contribute to the composite hash
(refer to a.second[0], a.second[1], and var_to_cached_bound_map to locate the
code).
cpp/src/utilities/work_limit_context.hpp-102-115 (1)

102-115: ⚠️ Potential issue | 🟠 Major

Don't commit NaN/Inf work into the shared context.

set_current_work() warns on non-finite input but still writes it to global_work_units_elapsed and the producer atomic. Once that happens, every later horizon comparison is poisoned. Return or assert before mutating state.

Minimal fix
   void set_current_work(double total_work, bool notify_producer = true)
   {
     if (!deterministic) return;
     if (!std::isfinite(total_work)) {
       CUOPT_LOG_WARN("set_current_work non-finite: %g (prev=%g) ctx=%s",
                      total_work,
                      global_work_units_elapsed,
                      name.c_str());
+      cuopt_assert(false, "Deterministic work progress must be finite");
+      return;
     }
     cuopt_assert(total_work + 1e-12 >= global_work_units_elapsed,
                  "Deterministic work progress must be monotonic");

As per coding guidelines, "Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/utilities/work_limit_context.hpp` around lines 102 - 115,
set_current_work is warning on non-finite total_work but still commits it into
global_work_units_elapsed and producer_work_units_elapsed, poisoning later
comparisons; modify set_current_work to return (or abort via assertion)
immediately when total_work is not finite (and deterministic is true) before any
mutation of global_work_units_elapsed or producer_work_units_elapsed, preserving
the existing log/diagnostic (CUOPT_LOG_WARN) and keeping the monotonicity check
(cuopt_assert) only for finite inputs; key symbols: set_current_work,
total_work, global_work_units_elapsed, producer_work_units_elapsed,
producer_sync, deterministic.
cpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cu-389-396 (1)

389-396: ⚠️ Potential issue | 🟠 Major

Populate fj.settings.work_limit in deterministic mode.

This branch only floors fj.settings.time_limit; fj.settings.work_limit stays at +inf. In deterministic solves that lets the fallback FJ run consume unbounded local work even after the FP slice is exhausted.

Minimal fix
   fj.settings.time_limit             = std::min(time_limit, timer.remaining_time());
   if (timer.deterministic) {
     fj.settings.time_limit = std::max((f_t)0.0, fj.settings.time_limit);
+    fj.settings.work_limit = fj.settings.time_limit;
     if (fj.settings.time_limit == 0.0) {
       CUOPT_LOG_DEBUG("Skipping 20%% FJ run due to exhausted deterministic work budget");
       return false;
     }
+  } else {
+    fj.settings.work_limit = std::numeric_limits<double>::infinity();
   }

The longer cycle-escape FJ entry point should mirror the same update.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cu`
around lines 389 - 396, The deterministic branch updates fj.settings.time_limit
but leaves fj.settings.work_limit at +inf; mirror the same logic for work limits
by setting fj.settings.work_limit = std::min(work_limit, timer.remaining_work())
where the time limit is set, then in the timer.deterministic block floor
fj.settings.work_limit = std::max((w_t)0.0, fj.settings.work_limit) and treat
fj.settings.work_limit == 0.0 the same way (log and return); also apply the same
paired update to the longer cycle-escape FJ entry point so both time and work
limits are bounded in deterministic mode.
cpp/src/mip_heuristics/solution_callbacks.cuh-206-207 (1)

206-207: ⚠️ Potential issue | 🟠 Major

Use a relative/tolerance-based objective check here.

A fixed 1e-6 absolute assert is too tight after preprocess/scaling/uncrush, especially for large-magnitude objectives and float instantiations. This can reject valid user injections solely from expected floating-point drift. Compare against solver tolerances or a relative tolerance instead.

As per coding guidelines, "Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/solution_callbacks.cuh` around lines 206 - 207, The
absolute 1e-6 check in cuopt_assert comparing outside_sol.get_user_objective()
and outside_sol_objective is too strict; replace it with a
relative/tolerance-based comparison using a scaled epsilon (e.g., tol =
max(min_tol, rel_eps * max(std::abs(outside_sol.get_user_objective()),
std::abs(outside_sol_objective), 1.0))) and assert that the absolute difference
<= tol; update the cuopt_assert call accordingly so it uses this computed
tolerance (referencing outside_sol.get_user_objective(), outside_sol_objective,
and cuopt_assert) or, if available, use the solver's tolerance constant instead
of hardcoding 1e-6.
cpp/tests/mip/diversity_test.cu-334-335 (1)

334-335: ⚠️ Potential issue | 🟠 Major

Recompute path after honoring CUOPT_INSTANCE.

In both active tests, path is built before the env override is applied, so setting CUOPT_INSTANCE only changes the label/logging; the helper still parses the parametrized file. The disabled full-run test already handles this correctly.

💡 Minimal fix
-  auto path     = make_path_absolute(test_instance);
-  test_instance = std::getenv("CUOPT_INSTANCE") ? std::getenv("CUOPT_INSTANCE") : test_instance;
+  if (const char* instance = std::getenv("CUOPT_INSTANCE")) {
+    test_instance = instance;
+  }
+  auto path = make_path_absolute(test_instance);

Apply the same reorder in both test bodies.

Also applies to: 363-364

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/diversity_test.cu` around lines 334 - 335, The code builds path
from test_instance before applying the CUOPT_INSTANCE override, so the
environment override only changes the label but not the actual file parsed; fix
by moving the make_path_absolute call to after the override: first
evaluate/assign test_instance using std::getenv("CUOPT_INSTANCE") ?:
test_instance, then call auto path = make_path_absolute(test_instance). Apply
the same reorder in both test bodies that use the variables path and
test_instance (the two active tests where path is computed before the getenv
override).
cpp/tests/mip/diversity_test.cu-265-288 (1)

265-288: ⚠️ Potential issue | 🟠 Major

Key the recombiner gold map by seed too.

hash_map is static for the whole process, but the key only uses (path, i, j, recombiner). A later invocation on the same instance with a different seed will compare against stale gold data and fail even if each seed is internally deterministic. Include seed in the key or clear the map at the start of the top-level test.

As per coding guidelines, "Ensure test isolation: prevent GPU state, cached memory, and global variables from leaking between test cases; verify each test independently initializes its environment."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/diversity_test.cu` around lines 265 - 288, The static hash_map
is keyed by (path, i, j, recombiner) but not seed, causing cross-test pollution;
either include the seed in the map key (e.g., use std::make_tuple(path, seed, i,
j, recombiner) wherever hash_map is accessed) or clear hash_map at the start of
the top-level test before any calls into diversity_manager.recombine/pop_vector
so each test run is isolated from prior seeds; update all lookup, insert, and
compare sites that reference hash_map accordingly.
cpp/src/mip_heuristics/diversity/diversity_manager.cu-499-502 (1)

499-502: ⚠️ Potential issue | 🟠 Major

Use the remaining deterministic budget here, not the original child limit.

In deterministic mode timer.get_time_limit() returns this child’s full budget, while timer.remaining_time() is the value that reflects work already consumed on context.gpu_heur_loop. After presolve or earlier heuristics spend work, this code recalculates lp_time_limit from the original budget and can oversubscribe the end-to-end deterministic limit.

💡 Minimal fix
-  const f_t time_limit = timer.deterministic ? timer.get_time_limit() : timer.remaining_time();
+  const f_t time_limit = timer.remaining_time();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu` around lines 499 -
502, The code currently uses timer.get_time_limit() when timer.deterministic,
which yields the child's original budget and can oversubscribe after prior work;
change the logic that computes time_limit so that when timer.deterministic you
use timer.remaining_time() (the deterministic remaining budget reflecting work
done on context.gpu_heur_loop) instead of timer.get_time_limit(), then recompute
lp_time_limit using diversity_config.max_time_on_lp and
diversity_config.time_ratio_on_init_lp as before; update any comments if present
around population.timer, time_limit, or lp_time_limit to reflect that
remaining_time() is used for deterministic budgeting.
cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu-994-1019 (1)

994-1019: ⚠️ Potential issue | 🟠 Major

Check the refreshed-LHS kernel launch immediately.

init_lhs_and_violated_constraints<<<...>>> is a new raw kernel launch, but this path consumes its outputs without any launch-status check. If the launch fails, the later reductions report garbage and the failure is much harder to localize.

🛡️ Minimal fix
   data.violated_constraints.clear(stream);
   init_lhs_and_violated_constraints<i_t, f_t><<<4096, 256, 0, stream>>>(v);
+  RAFT_CHECK_CUDA(stream);
   // both transformreduce could be fused; but oh well hardly a bottleneck

As per coding guidelines, "Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu` around lines 994
- 1019, The kernel init_lhs_and_violated_constraints<<<4096,256,0,stream>>>(v)
is launched and its outputs are consumed immediately without any launch-status
check; add an immediate CUDA error check after that launch (e.g., call
CUDA_CHECK(cudaGetLastError())) and then ensure the stream is
synchronized/checked before doing the transform_reduce ops (e.g.,
CUDA_CHECK(cudaStreamSynchronize(stream)) or equivalent project-level check), so
place the checks right after init_lhs_and_violated_constraints and before the
thrust::transform_reduce calls that read v.incumbent_lhs/v.violated_constraints.
cpp/src/branch_and_bound/branch_and_bound.cpp-3722-3748 (1)

3722-3748: ⚠️ Potential issue | 🟠 Major

Retire horizon-boundary solutions in the same sync.

The deterministic contract here is “timestamps <= current horizon”, but Line 3727 uses < deterministic_current_horizon_. Solutions stamped exactly on the boundary are deferred one sync later, which changes replay order and incumbent/pruning behavior.

Suggested fix
-        if (sol.work_timestamp < deterministic_current_horizon_) {
+        if (sol.work_timestamp <= deterministic_current_horizon_) {
           due_solutions.push_back(std::move(sol));
         } else {
           future_solutions.push_back(std::move(sol));
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 3722 - 3748, The
current extraction of due solutions from heuristic_solution_queue_ uses a strict
'< deterministic_current_horizon_' check which defers solutions stamped exactly
at the boundary and breaks the deterministic contract; update the condition in
the loop that builds due_solutions to use '<=' against
deterministic_current_horizon_ so queued_external_solution_t entries with
work_timestamp equal to deterministic_current_horizon_ are retired in the same
sync, leaving the rest logic (future_solutions assignment and subsequent sort)
unchanged.
cpp/src/mip_heuristics/diversity/population.cu-222-232 (1)

222-232: ⚠️ Potential issue | 🟠 Major

Fix the comparison that tightens new_best_feasible_objective.

Line 231 updates the threshold on a worse non-CPUFJ objective. That widens the CPUFJ gate and can admit scratch solutions that are still worse than the best non-CPUFJ solution already drained.

Suggested fix
-      } else if (h_entry.origin != internals::mip_solution_origin_t::CPU_FEASIBILITY_JUMP &&
-                 h_entry.objective > new_best_feasible_objective) {
-        new_best_feasible_objective = h_entry.objective;
+      } else if (h_entry.origin != internals::mip_solution_origin_t::CPU_FEASIBILITY_JUMP &&
+                 h_entry.objective < new_best_feasible_objective) {
+        new_best_feasible_objective = h_entry.objective;
       }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/population.cu` around lines 222 - 232, The
code is tightening new_best_feasible_objective in the wrong direction: in the
loop over external_solution_queue and external_solution_queue_cpufj the else-if
that updates new_best_feasible_objective uses ">" which moves the threshold to
worse objectives; change the comparison in that branch to check for a better
objective (use "<" for h_entry.objective) so that new_best_feasible_objective is
only lowered when a strictly better non-CPUFJ solution (h_entry) is found; refer
to symbols external_solution_queue, external_solution_queue_cpufj, h_entry,
internals::mip_solution_origin_t::CPU_FEASIBILITY_JUMP and
new_best_feasible_objective to locate and fix the condition.
cpp/src/mip_heuristics/local_search/local_search.cu-358-365 (1)

358-365: ⚠️ Potential issue | 🟠 Major

Replace timer reassignment with a local child timer to preserve caller's time budget tracking.

Line 364 overwrites the caller-owned timer parameter with a fresh 0.25/0.5-second budget. This breaks time accounting for callers: code at diversity_manager.cu logs timer.elapsed_time() after run_local_search() returns, getting only the sub-operation budget instead of total elapsed time. Similarly, subsequent timer.check_time_limit() checks use the wrong budget.

Create a local child timer instead:

work_limit_timer_t sub_timer = work_limit_timer_t(context.gpu_heur_loop, fj_settings.time_limit, timer);

Then pass sub_timer to run_fj_line_segment() and run_fj_annealing() at lines 382 and 385–386.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/local_search/local_search.cu` around lines 358 - 365,
The function currently overwrites the caller-owned timer variable with a new
work_limit_timer_t, breaking caller time accounting; instead, keep the original
timer and create a local child timer (work_limit_timer_t sub_timer) when setting
fj_settings.time_limit based on solution.get_feasible() and
ls_config.at_least_one_parent_feasible, then pass sub_timer (not timer) into
run_fj_line_segment() and run_fj_annealing() so elapsed_time() and
timer.check_time_limit() on the original timer remain correct.
🟡 Minor comments (8)
ci/compute-sanitizer-suppressions.xml-209-222 (1)

209-222: ⚠️ Potential issue | 🟡 Minor

Typo and broad D2D copy suppression.

Line 209 has a typo: "actualy" → "actually".

The suppression for cuMemcpyDtoDAsync (lines 210-222) is quite broad, matching any device-to-device copy. While the comment notes these are "usually harmless," this could mask legitimate uninitialized memory issues. Consider narrowing with a hostStack constraint if specific call sites are known.

📝 Fix typo
-  <!-- Uninitialized device-to-device copies are usually harmless - if actualy bogus, errors may be caught later on -->
+  <!-- Uninitialized device-to-device copies are usually harmless - if actually bogus, errors may be caught later on -->
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/compute-sanitizer-suppressions.xml` around lines 209 - 222, Fix the typo
in the record comment ("actualy" → "actually") and tighten the broad
cuMemcpyDtoDAsync suppression: update the <record> for InitcheckApiError that
matches cuMemcpyDtoDAsync to include more specific hostStack constraints (e.g.,
narrower <func> or additional <frame> entries or module/file patterns) or target
specific call sites instead of a wildcard to avoid masking real D2D
uninitialized accesses; locate the record by the InitcheckApiError <kind> and
the <func> pattern "cuMemcpyDtoDAsync.*" when making these edits.
cpp/src/mip_heuristics/diversity/recombiners/line_segment_recombiner.cuh-69-75 (1)

69-75: ⚠️ Potential issue | 🟡 Minor

Use consistent format specifier for uint32_t hash values.

get_hash() returns uint32_t, but line 74 logs it with %d. This format mismatch diverges from the codebase pattern (other recombiners use 0x%x). Even though CUOPT_DETERMINISM_LOG is currently disabled, use 0x%x for consistency when the macro is enabled.

Suggested change
-    CUOPT_DETERMINISM_LOG("LS rec: a %d b %d", a.get_hash(), b.get_hash());
+    CUOPT_DETERMINISM_LOG("LS rec: a 0x%x b 0x%x", a.get_hash(), b.get_hash());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/recombiners/line_segment_recombiner.cuh`
around lines 69 - 75, In recombine (function recombine in
line_segment_recombiner.cuh) change the format specifier used in the
CUOPT_DETERMINISM_LOG call so it matches the uint32_t return type of get_hash();
replace the "%d" placeholders for a.get_hash() and b.get_hash() with the hex
uint32_t specifier "0x%x" (keep the rest of the logging call intact) to be
consistent with other recombiners and avoid format/type mismatch.
cpp/tests/mip/local_search_test.cu-224-226 (1)

224-226: ⚠️ Potential issue | 🟡 Minor

Integer overflow risk when parsing CUOPT_SEED environment variable.

std::stoi returns int, which is then cast to unsigned long. For seeds larger than INT_MAX, this will throw or produce incorrect values. Consider using std::stoul instead:

Suggested fix
-    unsigned long seed = std::getenv("CUOPT_SEED")
-                           ? (unsigned long)std::stoi(std::getenv("CUOPT_SEED"))
+    unsigned long seed = std::getenv("CUOPT_SEED")
+                           ? std::stoul(std::getenv("CUOPT_SEED"))
                            : (unsigned long)std::random_device{}();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/local_search_test.cu` around lines 224 - 226, The code that
sets unsigned long seed reads CUOPT_SEED and uses std::stoi which returns int
and can overflow or throw for values > INT_MAX; update the parsing to use
std::stoul (or std::strtoul) instead of std::stoi so the conversion produces an
unsigned long directly and handle std::invalid_argument/std::out_of_range by
falling back to std::random_device(); ensure this change is applied where seed
is assigned (the getenv("CUOPT_SEED") branch) so casting is no longer needed and
large seed values are handled correctly.
cpp/tests/mip/determinism_utils.cuh-64-69 (1)

64-69: ⚠️ Potential issue | 🟡 Minor

Destructor should handle potential exceptions from thread join.

If spin_thread.join() throws (e.g., if the thread wasn't properly started), the destructor will terminate the program. Consider wrapping in a try-catch or checking joinable() first.

Suggested fix
   ~spin_stream_raii_t()
   {
     int one = 1;
     flag.set_value_async(one, stream);
-    spin_thread.join();
+    if (spin_thread.joinable()) {
+      spin_thread.join();
+    }
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/determinism_utils.cuh` around lines 64 - 69, The destructor
~spin_stream_raii_t() currently calls spin_thread.join() unconditionally which
can throw or call std::terminate if the thread isn't joinable; update the
destructor to first check spin_thread.joinable() before joining (e.g., if
(spin_thread.joinable()) spin_thread.join()) and/or wrap the join in a try-catch
to swallow/log exceptions, while ensuring flag.set_value_async(one, stream)
still runs; modify references in spin_stream_raii_t to use
spin_thread.joinable() and handle exceptions around spin_thread.join().
cpp/tests/mip/determinism_utils.cuh-47-53 (1)

47-53: ⚠️ Potential issue | 🟡 Minor

Thread-safety issue: rand() is not thread-safe.

The rand() function is not thread-safe and can cause data races if multiple spin_stream_raii_t instances are created concurrently. Consider using a thread-local random engine or std::random_device with proper synchronization.

Suggested fix
 static void launch_spin_kernel_stream_thread(rmm::cuda_stream_view stream_view, int* flag)
 {
+  thread_local std::mt19937 rng{std::random_device{}()};
+  std::uniform_int_distribution<int> block_dist(1, 64);
+  std::uniform_int_distribution<int> thread_dist(1, 1024);
+  std::uniform_int_distribution<int> sleep_dist(1, 1000);
   while (true) {
-    int blocks  = rand() % 64 + 1;
-    int threads = rand() % 1024 + 1;
+    int blocks  = block_dist(rng);
+    int threads = thread_dist(rng);
     spin_kernel<<<blocks, threads, 0, stream_view>>>(flag);
     cudaStreamSynchronize(stream_view);
     if (host_copy(flag, 1, stream_view)[0] != 0) { break; }
-    std::this_thread::sleep_for(std::chrono::milliseconds(rand() % 1000 + 1));
+    std::this_thread::sleep_for(std::chrono::milliseconds(sleep_dist(rng)));
   }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/determinism_utils.cuh` around lines 47 - 53, The loop uses the
non-thread-safe C rand(), causing races when multiple spin_stream_raii_t
instances run concurrently; replace rand() with a thread-local C++ random engine
(e.g., thread_local std::mt19937 seeded from std::random_device) and use
std::uniform_int_distribution to produce blocks and threads and the sleep
milliseconds, add necessary <random> includes, and keep the rest of the loop
(spin_kernel, host_copy, stream_view) unchanged so each thread has its own RNG
and no shared global state is used.
cpp/tests/mip/determinism_utils.cuh-51-51 (1)

51-51: ⚠️ Potential issue | 🟡 Minor

Missing CUDA error checking on cudaStreamSynchronize.

Per coding guidelines, CUDA operations should have error checking. Use RAFT_CUDA_TRY or equivalent.

Suggested fix
-    cudaStreamSynchronize(stream_view);
+    RAFT_CUDA_TRY(cudaStreamSynchronize(stream_view));

As per coding guidelines: "Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/determinism_utils.cuh` at line 51, Add CUDA error checking
around the cudaStreamSynchronize call in determinism_utils.cuh: replace the raw
cudaStreamSynchronize(stream_view) invocation with the appropriate checked macro
(e.g., RAFT_CUDA_TRY or CUDA_CHECK) so any error returned by
cudaStreamSynchronize is captured and reported; ensure you reference the same
stream_view variable and include any necessary headers or macros already used
elsewhere in the file to compile cleanly.
cpp/tests/mip/mip_utils.cuh-240-248 (1)

240-248: ⚠️ Potential issue | 🟡 Minor

Potential race: reading device scalars before ensuring solve completion.

After fj.solve(solution), the code immediately reads device scalars (local_minimums_reached, incumbent_objective, violation_score) via .value(). While host_copy at line 241 synchronizes, the .value() calls on climber state may access data that's still being written if the solve uses async operations internally.

Consider adding an explicit sync after fj.solve():

Suggested fix
   fj.solve(solution);
+  solution.handle_ptr->sync_stream();
   auto solution_vector = host_copy(solution.assignment, solution.handle_ptr->get_stream());
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/mip/mip_utils.cuh` around lines 240 - 248, After calling
fj.solve(solution) ensure the device work is complete before reading climber
device scalars: insert an explicit synchronization on the stream used by the
solve (use solution.handle_ptr->get_stream() or the same stream that fj.solve
uses) immediately after fj.solve(solution) and before calling
fj.climbers[0]->local_minimums_reached.value(...),
fj.climbers[0]->incumbent_objective.value(...), and
fj.climbers[0]->violation_score.value(...); this guarantees the device writes
are finished (alternatively use the library's provided stream/device sync API)
even if host_copy/assignment uses a different synchronization.
cpp/src/mip_heuristics/diversity/recombiners/recombiner.cuh-248-257 (1)

248-257: ⚠️ Potential issue | 🟡 Minor

Nondeterministic iteration order from unordered_set.

std::unordered_set does not guarantee iteration order, so converting it directly to a vector (line 250) produces a nondeterministic enabled_recombiners order. If recombiner execution order affects results (e.g., through solution selection or early termination), this could break determinism.

Consider sorting enabled_recombiners after construction:

🔧 Proposed fix
     recombiner_t::enabled_recombiners =
       std::vector<recombiner_enum_t>(enabled_recombiners.begin(), enabled_recombiners.end());
+    std::sort(recombiner_t::enabled_recombiners.begin(),
+              recombiner_t::enabled_recombiners.end());
     cuopt_assert(!recombiner_t::enabled_recombiners.empty(), "No recombiners enabled after init");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/mip_heuristics/diversity/recombiners/recombiner.cuh` around lines 248
- 257, The enabled_recombiners vector is built from an unordered_set so its
iteration order is nondeterministic; after creating
recombiner_t::enabled_recombiners from enabled_recombiners, sort that vector
deterministically (e.g., by the enum numeric value or by recombiner_name) so
subsequent logic (recombiner_t::enabled_recombiners, the loop that builds
order_str, and CUOPT_DETERMINISM_LOG) sees a stable order; ensure the sort
happens immediately after the vector construction and before any use.

@aliceb-nv
Copy link
Copy Markdown
Contributor Author

/ok to test c3f8fc3

@chris-maes chris-maes added the P2 label Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change P2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants