Incorporate determinism into GPU heuristics + parallel determinsitic B&B+GPU exploration#986
Incorporate determinism into GPU heuristics + parallel determinsitic B&B+GPU exploration#986aliceb-nv wants to merge 294 commits intoNVIDIA:release/26.04from
Conversation
|
/ok to test 1678640 |
|
/ok to test dc8f073 |
📝 WalkthroughWalkthroughAdded deterministic MIP execution and work-unit accounting across the solver: new solution-origin metadata and extended incumbent callbacks, a unified termination/work-limit timer, work-context plumbing, deterministic logging hooks, many timer/type conversions to work_limit_timer_t, and accompanying tests and CI suppressions. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 4
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
cpp/src/mip_heuristics/presolve/lb_probing_cache.cu (1)
279-286:⚠️ Potential issue | 🟡 MinorMissing CUDA error check after kernel launch.
The
compute_min_slack_per_varkernel launch lacks error verification. As per coding guidelines, every CUDA kernel launch should have error checking withCUDA_CHECKor equivalent.🛡️ Proposed fix
compute_min_slack_per_var<i_t, f_t> <<<problem.pb->n_integer_vars, 128, 0, problem.handle_ptr->get_stream()>>>( problem.pb->view(), make_span_2(bound_presolve.cnst_slack), make_span(min_slack_per_var), make_span(different_coefficient), make_span(max_excess_per_var), make_span(max_n_violated_per_constraint)); + RAFT_CHECK_CUDA(problem.handle_ptr->get_stream());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/presolve/lb_probing_cache.cu` around lines 279 - 286, The kernel launch of compute_min_slack_per_var currently has no CUDA error checking; after the launch (which uses problem.handle_ptr->get_stream()), add a CUDA error check call (e.g., CUDA_CHECK(cudaGetLastError())) to capture launch errors and, if the code expects the kernel to be completed before continuing, also add CUDA_CHECK(cudaDeviceSynchronize()) or the stream-synchronized equivalent; place these checks immediately after the compute_min_slack_per_var<<<...>>>(...) invocation to ensure any launch or runtime errors are reported.cpp/src/mip_heuristics/solution/solution.cu (1)
108-124:⚠️ Potential issue | 🟠 MajorDon't suppress initcheck and then copy the same buffers.
Lines 108-115 mark
other_sol's device buffers as initialized, but Lines 117-124 still copy those same buffers into*this. If those fields are actually deferred/uninitialized, this turns a sanitizer hit into silent garbage propagation and can leak stale excess/objective state into later computations. Either skip these copies or initialize/recompute the destination buffers first. As per coding guidelines "Validate correct initialization of variable bounds, constraint coefficients, and algorithm state before solving; ensure reset when transitioning between algorithm phases (presolve, simplex, diving, crossover)".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/solution/solution.cu` around lines 108 - 124, The code currently calls cuopt::mark_span_as_initialized on other_sol.lower_excess / upper_excess / constraint_value / obj_val / n_feasible_constraints and then immediately copies from those same buffers into this via expand_device_copy and raft::copy, which can hide uninitialized-source problems; fix by either (A) removing the premature cuopt::mark_span_as_initialized calls and only mark the destination spans after you legitimately initialize/compute or successfully copy validated data, or (B) ensure the source other_sol fields are actually initialized/recomputed before marking (e.g., run the routine that populates other_sol.lower_excess/upper_excess/constraint_value/obj_val/n_feasible_constraints) and only then call cuopt::mark_span_as_initialized and proceed with expand_device_copy / raft::copy; locate these changes around the uses of cuopt::mark_span_as_initialized, expand_device_copy, and raft::copy for the symbols other_sol.lower_excess, other_sol.upper_excess, other_sol.constraint_value, other_sol.obj_val, and other_sol.n_feasible_constraints.cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu (1)
704-730:⚠️ Potential issue | 🔴 CriticalUpdate function implementations to accept
work_limit_timer_t&instead oftimer_t&.The header declarations for
find_integer()(line 38-44) andrun_repair_procedure()(lines 83-87) inlb_constraint_prop.cuhspecifywork_limit_timer_t& timer, but their implementations inlb_constraint_prop.cu(lines 753 and 453 respectively) declare the parameter astimer_t& timer. This signature mismatch causes work-based termination semantics to be lost:
apply_round()now correctly acceptswork_limit_timer_t&(line 704)- Calls to
find_integer()andrun_repair_procedure()pass thework_limit_timer_treference- However, implementations receive only
timer_t, which tracks wall-clock time only and lacks the work-limit context- In deterministic mode (work-limited execution), this breaks the work budget enforcement end-to-end
Update implementations to match declarations: change parameter type from
timer_t& timertowork_limit_timer_t& timerin both methods.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/local_search/rounding/lb_constraint_prop.cu` around lines 704 - 730, The implementations of find_integer(...) and run_repair_procedure(...) must use the same timer type as their declarations: change their parameter types from timer_t& timer to work_limit_timer_t& timer so the work-limit semantics propagate (e.g., update the function signatures and any internal uses of timer in the find_integer and run_repair_procedure definitions to accept and forward a work_limit_timer_t&), ensuring calls from apply_round and other callers pass the correct work_limit_timer_t reference without altering call sites.cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuh (1)
134-160:⚠️ Potential issue | 🟠 MajorReserve the sentinel and clamp score conversions.
fj_staged_score_tnow stores scores inint32_t, but the write sites still derive them from rounded floating-point aggregates. Without a checked conversion, large FJ penalties can overflow into the sentinel range and silently corrupt move ordering. Please keepINT32_MINreserved forinvalid()and clamp/assert at the conversion point.Possible direction
struct fj_staged_score_t { int32_t base{std::numeric_limits<int32_t>::lowest()}; int32_t bonus{std::numeric_limits<int32_t>::lowest()}; + + HDI static int32_t clamp_score(double v) + { + cuopt_assert(std::isfinite(v), "score should be finite"); + constexpr double lo = + static_cast<double>(std::numeric_limits<int32_t>::lowest()) + 1.0; + constexpr double hi = static_cast<double>(std::numeric_limits<int32_t>::max()); + return static_cast<int32_t>(max(lo, min(hi, std::round(v)))); + }Use
clamp_score(...)at the score write sites incompute_new_score.As per coding guidelines, "Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cuh` around lines 134 - 160, The fj_staged_score_t type reserves INT32_MIN as the invalid sentinel but scores are still produced by rounding floating aggregates in compute_new_score; modify the conversion sites in compute_new_score to use a clamp_score(...) helper (or add one) that clamps rounded values into [INT32_MIN+1, INT32_MAX] and asserts or handles any out-of-range/NaN inputs before constructing fj_staged_score_t, so no valid score can equal INT32_MIN and invalid() remains unique; update any direct fj_staged_score_t(...) constructions in compute_new_score to use the clamped value and add a short comment referencing fj_staged_score_t::invalid().cpp/src/mip_heuristics/diversity/diversity_manager.cu (1)
126-143:⚠️ Potential issue | 🔴 CriticalInitialize
context.gpu_heur_loop.deterministicbefore the env-gated returns.The common path with no
CUOPT_CONFIG_IDhits the early return first, so the assignment at Lines 142-143 never runs. Every laterwork_limit_timer_tbuilt oncontext.gpu_heur_loopthen behaves opportunistically, which silently disables deterministic work accounting for the default case.💡 Minimal fix
CUOPT_DETERMINISM_LOG( "Deterministic solve start diversity state: seed_state=%lld fp_max=%zu " "ls_max=%zu bp_max=%zu sub_mip_max=%zu last_lm=%d last_ls=%d " "enabled_recombiners=%zu", (long long)cuopt::seed_generator::peek_seed(), fp_recombiner_config_t::max_n_of_vars_from_other, ls_recombiner_config_t::max_n_of_vars_from_other, bp_recombiner_config_t::max_n_of_vars_from_other, sub_mip_recombiner_config_t::max_n_of_vars_from_other, (int)mab_ls_config_t<i_t, f_t>::last_lm_config, (int)mab_ls_config_t<i_t, f_t>::last_ls_mab_option, recombiner_t<i_t, f_t>::enabled_recombiners.size()); + context.gpu_heur_loop.deterministic = + (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS); + int max_config = -1; int env_config_id = -1; const char* env_max_config = std::getenv("CUOPT_MAX_CONFIG"); ... - context.gpu_heur_loop.deterministic = - (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu` around lines 126 - 143, The assignment to context.gpu_heur_loop.deterministic should be moved so it always runs before any early returns from parsing CUOPT_CONFIG_ID; initialize context.gpu_heur_loop.deterministic using (context.settings.determinism_mode & CUOPT_DETERMINISM_GPU_HEURISTICS) immediately after reading/inspecting env_config_id_raw (or even before parsing), then proceed with parsing env_config_id_raw, exception handling, and the max_config check so later code that constructs work_limit_timer_t sees the correct deterministic flag.cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu (1)
728-735:⚠️ Potential issue | 🟠 MajorDon’t hard-sync this hot path just for logging.
update_host_assignment()is called inside the main rounding loop, and the newsync_stream()now serializes every iteration even when determinism logging is compiled out. Please gate the sync/hash behind deterministic logging (or the deterministic mode flag) so opportunistic runs stay asynchronous.As per coding guidelines, "Eliminate unnecessary host-device synchronization (cudaDeviceSynchronize) in hot paths that blocks GPU pipeline; use streams and events for async execution."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/local_search/rounding/constraint_prop.cu` around lines 728 - 735, The code in update_host_assignment copies device data back to host then unconditionally calls sol.handle_ptr->sync_stream() and compute_hash, which forces synchronous GPU execution every iteration; change this so the stream sync and compute_hash call are only executed when deterministic logging is enabled (i.e., inside the CUOPT_DETERMINISM_LOG path or guarded by the deterministic-mode flag). Specifically, keep the raft::copy to host asynchronous, and move or wrap sol.handle_ptr->sync_stream() and the call to detail::compute_hash(...) (and the CUOPT_DETERMINISM_LOG invocation) inside a conditional that checks deterministic logging/flag so normal runs remain asynchronous. Ensure you reference update_host_assignment, sol.handle_ptr->sync_stream(), detail::compute_hash(make_span(sol.assignment), ...), and CUOPT_DETERMINISM_LOG when making the change.cpp/src/branch_and_bound/branch_and_bound.cpp (1)
732-755:⚠️ Potential issue | 🟠 MajorPublish incumbent callbacks after releasing
mutex_upper_.Both blocks call
emit_solution_callback_from_crushed(...)while holdingmutex_upper_.new_incumbent_callbackis external/user-provided code, so keeping the upper-bound lock across it risks deadlocks and stalls competing incumbent updates. Capture the payload under the lock, unlock, then publish.Also applies to: 963-989
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 732 - 755, The incumbent callback (emit_solution_callback_from_crushed) is being invoked while holding mutex_upper_, risking deadlocks because the callback is external; change both places (the block around mutex_upper_ here and the similar block at the other occurrence) to capture all necessary payload under the lock — e.g., read previous_upper, set upper_bound_ and call incumbent_.set_incumbent_solution, compute any hashes or report_heuristic inputs, and move emit_solution_callback_from_crushed invocation out of the critical section; unlock mutex_upper_ before calling emit_solution_callback_from_crushed so the lock is not held during the user callback.
🟠 Major comments (20)
cpp/src/mip_heuristics/diversity/recombiners/recombiner_stats.hpp-78-84 (1)
78-84:⚠️ Potential issue | 🟠 MajorReset the new cached recombiner metrics too.
These fields are initialized once, but
reset()does not clear them. Reusingall_recombine_statsafter a reset will keep reporting the previous attempt’s time/work until another recombiner runs.🩹 Suggested change outside this hunk
void reset() { for (size_t i = 0; i < recombiner_count; ++i) { stats[i].reset(); } last_attempt.reset(); + last_recombiner_time = 0.0; + last_recombiner_work = 0.0; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/diversity/recombiners/recombiner_stats.hpp` around lines 78 - 84, The reset() method in recombiner_stats.hpp must also clear the cached recombiner metrics so past values aren't reused: inside reset() set last_recombiner_time to 0.0, reset last_recombiner_start_time (e.g., to a default-constructed std::chrono::high_resolution_clock::time_point), and set last_recombiner_work to 0.0; update any references to all_recombine_stats to rely on these cleared values so subsequent uses start from zeroed recombiner metrics.cpp/src/mip_heuristics/presolve/bounds_update_data.cu-42-77 (1)
42-77:⚠️ Potential issue | 🟠 MajorReset
bounds_changedwith the rest of the presolve scratch state.
resize()now reinitializes every device buffer exceptbounds_changed. If the samebounds_update_data_tinstance is reused, that scalar can carry the previous pass’s “changed” flag into the next problem/iteration and skew the presolve loop.🩹 Suggested change
changed_constraints.resize(problem.n_constraints, problem.handle_ptr->get_stream()); next_changed_constraints.resize(problem.n_constraints, problem.handle_ptr->get_stream()); changed_variables.resize(problem.n_variables, problem.handle_ptr->get_stream()); + thrust::fill_n(problem.handle_ptr->get_thrust_policy(), bounds_changed.data(), 1, 0); thrust::fill(problem.handle_ptr->get_thrust_policy(), min_activity.begin(), min_activity.end(),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/presolve/bounds_update_data.cu` around lines 42 - 77, The scalar bounds_changed in bounds_update_data_t is not reinitialized when buffers are resized, allowing a reused instance to carry over a previous "changed" flag; reset it during the same resize/initialization sequence (e.g., set bounds_changed = false or 0) after the device buffers are resized/filled (alongside min_activity, max_activity, lb, ub, changed_constraints, next_changed_constraints, changed_variables) so each new problem/iteration starts with bounds_changed cleared.cpp/src/branch_and_bound/mip_node.hpp-75-76 (1)
75-76:⚠️ Potential issue | 🟠 MajorDefaulted move semantics are semantically incorrect for this type; while currently safe in practice, replace with custom implementations to maintain the child-to-parent invariant.
The defaulted move constructor and assignment operator do not reparent children: moving a node transfers its
childrenunique_ptrs but leaves each child'sparentpointer unchanged, violating the type invariant. Although current call sites (lines 1856, 2909, 4236) only move leaf or detached nodes created viadetach_copy(), the default semantics permit unsafe moves of branched nodes if the code is later refactored. Replace the defaulted move operations with custom implementations that reparent all immediate children to the moved-to object.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/mip_node.hpp` around lines 75 - 76, The defaulted move constructor and move assignment for mip_node_t break the child->parent invariant because moved children keep their old parent pointer; implement a custom mip_node_t(mip_node_t&&) and mip_node_t& operator=(mip_node_t&&) that transfer/move all members and, for each moved child in the moved-from node's children container, reset the child's parent pointer to the destination object (this). In operator= also guard against self-assignment, properly clear or release any existing children of the destination before moving, and ensure other relevant members (e.g., node data/ids/ownership) are moved consistently so the tree invariant is preserved.cpp/tests/mip/feasibility_jump_tests.cu-169-197 (1)
169-197:⚠️ Potential issue | 🟠 MajorTighten the determinism check and keep its baseline local.
Lines 192-195 use a
staticbaseline that survives repeated test runs, and Line 196 only asserts the objective within ±1.0. Different incumbents can still pass. For a determinism test, capture the first run inside the test loop and compare exact values on every repeat (e.g. objective plussolution.get_hash()/ full assignment). As per coding guidelines "Ensure test isolation: prevent GPU state, cached memory, and global variables from leaking between test cases; verify each test independently initializes its environment" and "Write tests validating numerical correctness of optimization results (not just 'runs without error'); test degenerate cases (infeasible, unbounded, empty, singleton problems)".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/feasibility_jump_tests.cu` around lines 169 - 197, The determinism check in run_fj_check_determinism uses a static first_val_map and a loose EXPECT_NEAR tolerance allowing different incumbents to pass; change it to keep the baseline local to the test invocation (remove static std::unordered_map first_val_map), capture the baseline on the first run of this test invocation by storing solution.get_user_objective() and solution.get_hash() (or full assignment) into local variables, then on subsequent repeats assert strict equality (EXPECT_EQ or EXPECT_DOUBLE_EQ for objective plus EXPECT_EQ for solution.get_hash()) to ensure exact determinism; reference run_fj_check_determinism, run_fj_instance, solution.get_user_objective(), and solution.get_hash() when making the change.cpp/include/cuopt/linear_programming/constants.h-87-90 (1)
87-90:⚠️ Potential issue | 🟠 Major
CUOPT_MODE_DETERMINISTICno longer preserves the legacy numeric value.After this change, older callers that still send the raw integer
1will silently get B&B-only determinism instead of full determinism. If that break is intentional, it needs an explicit migration path; otherwise the parameter parser should keep legacy1mapped to full determinism.As per coding guidelines, "Flag API changes that may need corresponding docs/ updates".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/include/cuopt/linear_programming/constants.h` around lines 87 - 90, The macro CUOPT_MODE_DETERMINISTIC was changed to map to CUOPT_DETERMINISM_FULL but legacy callers that pass the raw integer 1 will now be interpreted as CUOPT_DETERMINISM_BB; update the parameter parsing/mapping logic (where integers are translated to determinism enums) to explicitly map the legacy integer value 1 to CUOPT_DETERMINISM_FULL (or add a clear migration/compatibility path) so that CUOPT_MODE_DETERMINISTIC preserves legacy behavior; reference the symbols CUOPT_MODE_DETERMINISTIC, CUOPT_DETERMINISM_FULL, CUOPT_DETERMINISM_BB and ensure any docs/flags are annotated to indicate this compatibility decision.cpp/src/mip_heuristics/solve.cu-243-250 (1)
243-250:⚠️ Potential issue | 🟠 MajorThe new bitmask modes need a narrower flag here than
deterministic_run.
CUOPT_MODE_DETERMINISTIC_BBstill leaves GPU heuristics opportunistic, but passingdeterministic_run = trueintoproblem_tmakescpp/src/mip_heuristics/problem/problem.cudisable related-variable construction and suppress heuristic-side adaptive state. That means B&B-only mode now changes heuristic setup as a side effect. Please split this into adeterministic_gpu_heuristicsflag forproblem_tand keep the broader boolean only where full-solve orchestration really needs it.Also applies to: 293-294, 337-338
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/solve.cu` around lines 243 - 250, Create and use a narrower flag for GPU-heuristic determinism instead of passing the broad deterministic_run into detail::problem_t; compute a new bool (e.g., deterministic_gpu_heuristics) based on settings.determinism_mode that is true only for the modes that require deterministic GPU heuristics (not for CUOPT_MODE_DETERMINISTIC_BB), pass that flag into the problem_t constructor (replace the current deterministic_run param there), and leave deterministic_run unchanged for orchestration logic; apply the same change at the other occurrences mentioned (the other problem_t constructions around the noted locations).cpp/src/mip_heuristics/diversity/recombiners/bound_prop_recombiner.cuh-193-195 (1)
193-195: 🛠️ Refactor suggestion | 🟠 MajorReplace the placeholder recombiner work estimate before relying on
work_limit.
n_vars_from_other / 1e8is deterministic, but it is not meaningfully correlated with the actual bounds-prop/LP effort in this path. That makes the new work-unit scaling hard to reason about and easy to skew across instances.cpp/src/mip_heuristics/problem/problem.cu-1493-1513 (1)
1493-1513:⚠️ Potential issue | 🟠 MajorUse
stable_sort_by_keyor add a secondary sort key to ensure deterministic floating-point accumulation.
thrust::sort_by_keydoes not guarantee stability for equal keys—elements with the same key may be permuted. When multiple substitutions target the same variable, their deltas can be grouped in an unspecified order, causing the subsequentreduce_by_keyto produce non-deterministic floating-point sums. Replace withstable_sort_by_keyto preserve insertion order within equal keys, or sort by(key, original_position)to establish a total order.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/problem/problem.cu` around lines 1493 - 1513, The sort+reduce sequence uses thrust::sort_by_key on obj_coeff_keys/obj_coeff_deltas which is not stable and can lead to non-deterministic FP accumulation; change the sort to a stable sort (use thrust::stable_sort_by_key with the same policy and ranges) or implement a secondary key using the original index (e.g., create a position array and sort by (obj_coeff_keys, positions)) so that the grouping fed into thrust::reduce_by_key is deterministic; update the call site where thrust::sort_by_key is invoked and ensure the same key/value spans (obj_coeff_keys, obj_coeff_deltas) are passed to the stable sort so the later reduce_by_key and the lambda that updates objective_coefficients remain correct.cpp/tests/mip/determinism_test.cu-50-55 (1)
50-55:⚠️ Potential issue | 🟠 MajorThe new callback determinism test still misses
work_timestampcapture and non-bitwise floating-point comparisons.
work_timestampis the key field ordering callbacks for deterministic replay, but the test never captures it fromcallback_info. Also,EXPECT_EQondoubleusesoperator==(value equality), so exact IEEE-754 differences—like sign bits distinguishing+0.0from-0.0—slip through undetected. Extendcallback_solution_tto includework_timestamp, capture it during callback invocation (lines 88–93), and updateexpect_callback_prefixes_bitwise_equal()(lines 139–159) to compare the raw bits ofobjective,solution_bound,work_timestamp, and assignment entries rather than relying onEXPECT_EQ.Also applies to: 293–346
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/determinism_test.cu` around lines 50 - 55, Add a work_timestamp field to struct callback_solution_t and populate it from callback_info when the callback runs (the same place where assignment, objective and solution_bound are captured), then change expect_callback_prefixes_bitwise_equal to perform bitwise comparisons of the raw IEEE-754 representation (not EXPECT_EQ) for objective, solution_bound, work_timestamp and each element of assignment; use e.g. reinterpret_cast-to-integer or std::bit_cast on the doubles to compare their bit patterns so +0.0 vs -0.0 and other non-bitwise-equal floats are detected.cpp/tests/mip/presolve_test.cu-149-155 (1)
149-155:⚠️ Potential issue | 🟠 MajorHash both probe directions' cached-bound maps.
The composite hash only includes
a.second[0].var_to_cached_bound_map. Ifa.second[1]becomes nondeterministic, this test still passes, so it won't catch half of the probing-cache regressions it's meant to guard.Minimal fix
- auto sorted_map = std::map<int, detail::cached_bound_t<double>>( - a.second[0].var_to_cached_bound_map.begin(), a.second[0].var_to_cached_bound_map.end()); - for (const auto& [var_id, cached_bound] : sorted_map) { - var_to_cached_bound_keys.push_back(var_id); - var_to_cached_bound_lb.push_back(cached_bound.lb); - var_to_cached_bound_ub.push_back(cached_bound.ub); - } + for (int dir : {0, 1}) { + auto sorted_map = std::map<int, detail::cached_bound_t<double>>( + a.second[dir].var_to_cached_bound_map.begin(), + a.second[dir].var_to_cached_bound_map.end()); + for (const auto& [var_id, cached_bound] : sorted_map) { + var_to_cached_bound_keys.push_back(var_id); + var_to_cached_bound_lb.push_back(cached_bound.lb); + var_to_cached_bound_ub.push_back(cached_bound.ub); + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/presolve_test.cu` around lines 149 - 155, The test only hashes a.second[0].var_to_cached_bound_map, so it misses nondeterminism in a.second[1]; update the extraction to include both probe directions by iterating over both indices (0 and 1) of a.second and processing each var_to_cached_bound_map: for each a.second[i].var_to_cached_bound_map, build a sorted map (or otherwise deterministically order entries) and append var ids and their cached_bound.lb/ub into var_to_cached_bound_keys, var_to_cached_bound_lb, and var_to_cached_bound_ub so both probe directions contribute to the composite hash (refer to a.second[0], a.second[1], and var_to_cached_bound_map to locate the code).cpp/src/utilities/work_limit_context.hpp-102-115 (1)
102-115:⚠️ Potential issue | 🟠 MajorDon't commit NaN/Inf work into the shared context.
set_current_work()warns on non-finite input but still writes it toglobal_work_units_elapsedand the producer atomic. Once that happens, every later horizon comparison is poisoned. Return or assert before mutating state.Minimal fix
void set_current_work(double total_work, bool notify_producer = true) { if (!deterministic) return; if (!std::isfinite(total_work)) { CUOPT_LOG_WARN("set_current_work non-finite: %g (prev=%g) ctx=%s", total_work, global_work_units_elapsed, name.c_str()); + cuopt_assert(false, "Deterministic work progress must be finite"); + return; } cuopt_assert(total_work + 1e-12 >= global_work_units_elapsed, "Deterministic work progress must be monotonic");As per coding guidelines, "Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/utilities/work_limit_context.hpp` around lines 102 - 115, set_current_work is warning on non-finite total_work but still commits it into global_work_units_elapsed and producer_work_units_elapsed, poisoning later comparisons; modify set_current_work to return (or abort via assertion) immediately when total_work is not finite (and deterministic is true) before any mutation of global_work_units_elapsed or producer_work_units_elapsed, preserving the existing log/diagnostic (CUOPT_LOG_WARN) and keeping the monotonicity check (cuopt_assert) only for finite inputs; key symbols: set_current_work, total_work, global_work_units_elapsed, producer_work_units_elapsed, producer_sync, deterministic.cpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cu-389-396 (1)
389-396:⚠️ Potential issue | 🟠 MajorPopulate
fj.settings.work_limitin deterministic mode.This branch only floors
fj.settings.time_limit;fj.settings.work_limitstays at+inf. In deterministic solves that lets the fallback FJ run consume unbounded local work even after the FP slice is exhausted.Minimal fix
fj.settings.time_limit = std::min(time_limit, timer.remaining_time()); if (timer.deterministic) { fj.settings.time_limit = std::max((f_t)0.0, fj.settings.time_limit); + fj.settings.work_limit = fj.settings.time_limit; if (fj.settings.time_limit == 0.0) { CUOPT_LOG_DEBUG("Skipping 20%% FJ run due to exhausted deterministic work budget"); return false; } + } else { + fj.settings.work_limit = std::numeric_limits<double>::infinity(); }The longer cycle-escape FJ entry point should mirror the same update.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/local_search/feasibility_pump/feasibility_pump.cu` around lines 389 - 396, The deterministic branch updates fj.settings.time_limit but leaves fj.settings.work_limit at +inf; mirror the same logic for work limits by setting fj.settings.work_limit = std::min(work_limit, timer.remaining_work()) where the time limit is set, then in the timer.deterministic block floor fj.settings.work_limit = std::max((w_t)0.0, fj.settings.work_limit) and treat fj.settings.work_limit == 0.0 the same way (log and return); also apply the same paired update to the longer cycle-escape FJ entry point so both time and work limits are bounded in deterministic mode.cpp/src/mip_heuristics/solution_callbacks.cuh-206-207 (1)
206-207:⚠️ Potential issue | 🟠 MajorUse a relative/tolerance-based objective check here.
A fixed
1e-6absolute assert is too tight after preprocess/scaling/uncrush, especially for large-magnitude objectives and float instantiations. This can reject valid user injections solely from expected floating-point drift. Compare against solver tolerances or a relative tolerance instead.As per coding guidelines, "Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/solution_callbacks.cuh` around lines 206 - 207, The absolute 1e-6 check in cuopt_assert comparing outside_sol.get_user_objective() and outside_sol_objective is too strict; replace it with a relative/tolerance-based comparison using a scaled epsilon (e.g., tol = max(min_tol, rel_eps * max(std::abs(outside_sol.get_user_objective()), std::abs(outside_sol_objective), 1.0))) and assert that the absolute difference <= tol; update the cuopt_assert call accordingly so it uses this computed tolerance (referencing outside_sol.get_user_objective(), outside_sol_objective, and cuopt_assert) or, if available, use the solver's tolerance constant instead of hardcoding 1e-6.cpp/tests/mip/diversity_test.cu-334-335 (1)
334-335:⚠️ Potential issue | 🟠 MajorRecompute
pathafter honoringCUOPT_INSTANCE.In both active tests,
pathis built before the env override is applied, so settingCUOPT_INSTANCEonly changes the label/logging; the helper still parses the parametrized file. The disabled full-run test already handles this correctly.💡 Minimal fix
- auto path = make_path_absolute(test_instance); - test_instance = std::getenv("CUOPT_INSTANCE") ? std::getenv("CUOPT_INSTANCE") : test_instance; + if (const char* instance = std::getenv("CUOPT_INSTANCE")) { + test_instance = instance; + } + auto path = make_path_absolute(test_instance);Apply the same reorder in both test bodies.
Also applies to: 363-364
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/diversity_test.cu` around lines 334 - 335, The code builds path from test_instance before applying the CUOPT_INSTANCE override, so the environment override only changes the label but not the actual file parsed; fix by moving the make_path_absolute call to after the override: first evaluate/assign test_instance using std::getenv("CUOPT_INSTANCE") ?: test_instance, then call auto path = make_path_absolute(test_instance). Apply the same reorder in both test bodies that use the variables path and test_instance (the two active tests where path is computed before the getenv override).cpp/tests/mip/diversity_test.cu-265-288 (1)
265-288:⚠️ Potential issue | 🟠 MajorKey the recombiner gold map by
seedtoo.
hash_mapis static for the whole process, but the key only uses(path, i, j, recombiner). A later invocation on the same instance with a different seed will compare against stale gold data and fail even if each seed is internally deterministic. Includeseedin the key or clear the map at the start of the top-level test.As per coding guidelines, "Ensure test isolation: prevent GPU state, cached memory, and global variables from leaking between test cases; verify each test independently initializes its environment."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/diversity_test.cu` around lines 265 - 288, The static hash_map is keyed by (path, i, j, recombiner) but not seed, causing cross-test pollution; either include the seed in the map key (e.g., use std::make_tuple(path, seed, i, j, recombiner) wherever hash_map is accessed) or clear hash_map at the start of the top-level test before any calls into diversity_manager.recombine/pop_vector so each test run is isolated from prior seeds; update all lookup, insert, and compare sites that reference hash_map accordingly.cpp/src/mip_heuristics/diversity/diversity_manager.cu-499-502 (1)
499-502:⚠️ Potential issue | 🟠 MajorUse the remaining deterministic budget here, not the original child limit.
In deterministic mode
timer.get_time_limit()returns this child’s full budget, whiletimer.remaining_time()is the value that reflects work already consumed oncontext.gpu_heur_loop. After presolve or earlier heuristics spend work, this code recalculateslp_time_limitfrom the original budget and can oversubscribe the end-to-end deterministic limit.💡 Minimal fix
- const f_t time_limit = timer.deterministic ? timer.get_time_limit() : timer.remaining_time(); + const f_t time_limit = timer.remaining_time();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/diversity/diversity_manager.cu` around lines 499 - 502, The code currently uses timer.get_time_limit() when timer.deterministic, which yields the child's original budget and can oversubscribe after prior work; change the logic that computes time_limit so that when timer.deterministic you use timer.remaining_time() (the deterministic remaining budget reflecting work done on context.gpu_heur_loop) instead of timer.get_time_limit(), then recompute lp_time_limit using diversity_config.max_time_on_lp and diversity_config.time_ratio_on_init_lp as before; update any comments if present around population.timer, time_limit, or lp_time_limit to reflect that remaining_time() is used for deterministic budgeting.cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu-994-1019 (1)
994-1019:⚠️ Potential issue | 🟠 MajorCheck the refreshed-LHS kernel launch immediately.
init_lhs_and_violated_constraints<<<...>>>is a new raw kernel launch, but this path consumes its outputs without any launch-status check. If the launch fails, the later reductions report garbage and the failure is much harder to localize.🛡️ Minimal fix
data.violated_constraints.clear(stream); init_lhs_and_violated_constraints<i_t, f_t><<<4096, 256, 0, stream>>>(v); + RAFT_CHECK_CUDA(stream); // both transformreduce could be fused; but oh well hardly a bottleneckAs per coding guidelines, "Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/feasibility_jump/feasibility_jump.cu` around lines 994 - 1019, The kernel init_lhs_and_violated_constraints<<<4096,256,0,stream>>>(v) is launched and its outputs are consumed immediately without any launch-status check; add an immediate CUDA error check after that launch (e.g., call CUDA_CHECK(cudaGetLastError())) and then ensure the stream is synchronized/checked before doing the transform_reduce ops (e.g., CUDA_CHECK(cudaStreamSynchronize(stream)) or equivalent project-level check), so place the checks right after init_lhs_and_violated_constraints and before the thrust::transform_reduce calls that read v.incumbent_lhs/v.violated_constraints.cpp/src/branch_and_bound/branch_and_bound.cpp-3722-3748 (1)
3722-3748:⚠️ Potential issue | 🟠 MajorRetire horizon-boundary solutions in the same sync.
The deterministic contract here is “timestamps
<=current horizon”, but Line 3727 uses< deterministic_current_horizon_. Solutions stamped exactly on the boundary are deferred one sync later, which changes replay order and incumbent/pruning behavior.Suggested fix
- if (sol.work_timestamp < deterministic_current_horizon_) { + if (sol.work_timestamp <= deterministic_current_horizon_) { due_solutions.push_back(std::move(sol)); } else { future_solutions.push_back(std::move(sol)); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/branch_and_bound/branch_and_bound.cpp` around lines 3722 - 3748, The current extraction of due solutions from heuristic_solution_queue_ uses a strict '< deterministic_current_horizon_' check which defers solutions stamped exactly at the boundary and breaks the deterministic contract; update the condition in the loop that builds due_solutions to use '<=' against deterministic_current_horizon_ so queued_external_solution_t entries with work_timestamp equal to deterministic_current_horizon_ are retired in the same sync, leaving the rest logic (future_solutions assignment and subsequent sort) unchanged.cpp/src/mip_heuristics/diversity/population.cu-222-232 (1)
222-232:⚠️ Potential issue | 🟠 MajorFix the comparison that tightens
new_best_feasible_objective.Line 231 updates the threshold on a worse non-CPUFJ objective. That widens the CPUFJ gate and can admit scratch solutions that are still worse than the best non-CPUFJ solution already drained.
Suggested fix
- } else if (h_entry.origin != internals::mip_solution_origin_t::CPU_FEASIBILITY_JUMP && - h_entry.objective > new_best_feasible_objective) { - new_best_feasible_objective = h_entry.objective; + } else if (h_entry.origin != internals::mip_solution_origin_t::CPU_FEASIBILITY_JUMP && + h_entry.objective < new_best_feasible_objective) { + new_best_feasible_objective = h_entry.objective; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/diversity/population.cu` around lines 222 - 232, The code is tightening new_best_feasible_objective in the wrong direction: in the loop over external_solution_queue and external_solution_queue_cpufj the else-if that updates new_best_feasible_objective uses ">" which moves the threshold to worse objectives; change the comparison in that branch to check for a better objective (use "<" for h_entry.objective) so that new_best_feasible_objective is only lowered when a strictly better non-CPUFJ solution (h_entry) is found; refer to symbols external_solution_queue, external_solution_queue_cpufj, h_entry, internals::mip_solution_origin_t::CPU_FEASIBILITY_JUMP and new_best_feasible_objective to locate and fix the condition.cpp/src/mip_heuristics/local_search/local_search.cu-358-365 (1)
358-365:⚠️ Potential issue | 🟠 MajorReplace timer reassignment with a local child timer to preserve caller's time budget tracking.
Line 364 overwrites the caller-owned
timerparameter with a fresh 0.25/0.5-second budget. This breaks time accounting for callers: code at diversity_manager.cu logstimer.elapsed_time()afterrun_local_search()returns, getting only the sub-operation budget instead of total elapsed time. Similarly, subsequenttimer.check_time_limit()checks use the wrong budget.Create a local child timer instead:
work_limit_timer_t sub_timer = work_limit_timer_t(context.gpu_heur_loop, fj_settings.time_limit, timer);Then pass
sub_timertorun_fj_line_segment()andrun_fj_annealing()at lines 382 and 385–386.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/local_search/local_search.cu` around lines 358 - 365, The function currently overwrites the caller-owned timer variable with a new work_limit_timer_t, breaking caller time accounting; instead, keep the original timer and create a local child timer (work_limit_timer_t sub_timer) when setting fj_settings.time_limit based on solution.get_feasible() and ls_config.at_least_one_parent_feasible, then pass sub_timer (not timer) into run_fj_line_segment() and run_fj_annealing() so elapsed_time() and timer.check_time_limit() on the original timer remain correct.
🟡 Minor comments (8)
ci/compute-sanitizer-suppressions.xml-209-222 (1)
209-222:⚠️ Potential issue | 🟡 MinorTypo and broad D2D copy suppression.
Line 209 has a typo: "actualy" → "actually".
The suppression for
cuMemcpyDtoDAsync(lines 210-222) is quite broad, matching any device-to-device copy. While the comment notes these are "usually harmless," this could mask legitimate uninitialized memory issues. Consider narrowing with ahostStackconstraint if specific call sites are known.📝 Fix typo
- <!-- Uninitialized device-to-device copies are usually harmless - if actualy bogus, errors may be caught later on --> + <!-- Uninitialized device-to-device copies are usually harmless - if actually bogus, errors may be caught later on -->🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ci/compute-sanitizer-suppressions.xml` around lines 209 - 222, Fix the typo in the record comment ("actualy" → "actually") and tighten the broad cuMemcpyDtoDAsync suppression: update the <record> for InitcheckApiError that matches cuMemcpyDtoDAsync to include more specific hostStack constraints (e.g., narrower <func> or additional <frame> entries or module/file patterns) or target specific call sites instead of a wildcard to avoid masking real D2D uninitialized accesses; locate the record by the InitcheckApiError <kind> and the <func> pattern "cuMemcpyDtoDAsync.*" when making these edits.cpp/src/mip_heuristics/diversity/recombiners/line_segment_recombiner.cuh-69-75 (1)
69-75:⚠️ Potential issue | 🟡 MinorUse consistent format specifier for
uint32_thash values.
get_hash()returnsuint32_t, but line 74 logs it with%d. This format mismatch diverges from the codebase pattern (other recombiners use0x%x). Even thoughCUOPT_DETERMINISM_LOGis currently disabled, use0x%xfor consistency when the macro is enabled.Suggested change
- CUOPT_DETERMINISM_LOG("LS rec: a %d b %d", a.get_hash(), b.get_hash()); + CUOPT_DETERMINISM_LOG("LS rec: a 0x%x b 0x%x", a.get_hash(), b.get_hash());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/diversity/recombiners/line_segment_recombiner.cuh` around lines 69 - 75, In recombine (function recombine in line_segment_recombiner.cuh) change the format specifier used in the CUOPT_DETERMINISM_LOG call so it matches the uint32_t return type of get_hash(); replace the "%d" placeholders for a.get_hash() and b.get_hash() with the hex uint32_t specifier "0x%x" (keep the rest of the logging call intact) to be consistent with other recombiners and avoid format/type mismatch.cpp/tests/mip/local_search_test.cu-224-226 (1)
224-226:⚠️ Potential issue | 🟡 MinorInteger overflow risk when parsing
CUOPT_SEEDenvironment variable.
std::stoireturnsint, which is then cast tounsigned long. For seeds larger thanINT_MAX, this will throw or produce incorrect values. Consider usingstd::stoulinstead:Suggested fix
- unsigned long seed = std::getenv("CUOPT_SEED") - ? (unsigned long)std::stoi(std::getenv("CUOPT_SEED")) + unsigned long seed = std::getenv("CUOPT_SEED") + ? std::stoul(std::getenv("CUOPT_SEED")) : (unsigned long)std::random_device{}();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/local_search_test.cu` around lines 224 - 226, The code that sets unsigned long seed reads CUOPT_SEED and uses std::stoi which returns int and can overflow or throw for values > INT_MAX; update the parsing to use std::stoul (or std::strtoul) instead of std::stoi so the conversion produces an unsigned long directly and handle std::invalid_argument/std::out_of_range by falling back to std::random_device(); ensure this change is applied where seed is assigned (the getenv("CUOPT_SEED") branch) so casting is no longer needed and large seed values are handled correctly.cpp/tests/mip/determinism_utils.cuh-64-69 (1)
64-69:⚠️ Potential issue | 🟡 MinorDestructor should handle potential exceptions from thread join.
If
spin_thread.join()throws (e.g., if the thread wasn't properly started), the destructor will terminate the program. Consider wrapping in a try-catch or checkingjoinable()first.Suggested fix
~spin_stream_raii_t() { int one = 1; flag.set_value_async(one, stream); - spin_thread.join(); + if (spin_thread.joinable()) { + spin_thread.join(); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/determinism_utils.cuh` around lines 64 - 69, The destructor ~spin_stream_raii_t() currently calls spin_thread.join() unconditionally which can throw or call std::terminate if the thread isn't joinable; update the destructor to first check spin_thread.joinable() before joining (e.g., if (spin_thread.joinable()) spin_thread.join()) and/or wrap the join in a try-catch to swallow/log exceptions, while ensuring flag.set_value_async(one, stream) still runs; modify references in spin_stream_raii_t to use spin_thread.joinable() and handle exceptions around spin_thread.join().cpp/tests/mip/determinism_utils.cuh-47-53 (1)
47-53:⚠️ Potential issue | 🟡 MinorThread-safety issue:
rand()is not thread-safe.The
rand()function is not thread-safe and can cause data races if multiplespin_stream_raii_tinstances are created concurrently. Consider using a thread-local random engine orstd::random_devicewith proper synchronization.Suggested fix
static void launch_spin_kernel_stream_thread(rmm::cuda_stream_view stream_view, int* flag) { + thread_local std::mt19937 rng{std::random_device{}()}; + std::uniform_int_distribution<int> block_dist(1, 64); + std::uniform_int_distribution<int> thread_dist(1, 1024); + std::uniform_int_distribution<int> sleep_dist(1, 1000); while (true) { - int blocks = rand() % 64 + 1; - int threads = rand() % 1024 + 1; + int blocks = block_dist(rng); + int threads = thread_dist(rng); spin_kernel<<<blocks, threads, 0, stream_view>>>(flag); cudaStreamSynchronize(stream_view); if (host_copy(flag, 1, stream_view)[0] != 0) { break; } - std::this_thread::sleep_for(std::chrono::milliseconds(rand() % 1000 + 1)); + std::this_thread::sleep_for(std::chrono::milliseconds(sleep_dist(rng))); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/determinism_utils.cuh` around lines 47 - 53, The loop uses the non-thread-safe C rand(), causing races when multiple spin_stream_raii_t instances run concurrently; replace rand() with a thread-local C++ random engine (e.g., thread_local std::mt19937 seeded from std::random_device) and use std::uniform_int_distribution to produce blocks and threads and the sleep milliseconds, add necessary <random> includes, and keep the rest of the loop (spin_kernel, host_copy, stream_view) unchanged so each thread has its own RNG and no shared global state is used.cpp/tests/mip/determinism_utils.cuh-51-51 (1)
51-51:⚠️ Potential issue | 🟡 MinorMissing CUDA error checking on
cudaStreamSynchronize.Per coding guidelines, CUDA operations should have error checking. Use
RAFT_CUDA_TRYor equivalent.Suggested fix
- cudaStreamSynchronize(stream_view); + RAFT_CUDA_TRY(cudaStreamSynchronize(stream_view));As per coding guidelines: "Every CUDA kernel launch and memory operation must have error checking with CUDA_CHECK or equivalent verification".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/determinism_utils.cuh` at line 51, Add CUDA error checking around the cudaStreamSynchronize call in determinism_utils.cuh: replace the raw cudaStreamSynchronize(stream_view) invocation with the appropriate checked macro (e.g., RAFT_CUDA_TRY or CUDA_CHECK) so any error returned by cudaStreamSynchronize is captured and reported; ensure you reference the same stream_view variable and include any necessary headers or macros already used elsewhere in the file to compile cleanly.cpp/tests/mip/mip_utils.cuh-240-248 (1)
240-248:⚠️ Potential issue | 🟡 MinorPotential race: reading device scalars before ensuring solve completion.
After
fj.solve(solution), the code immediately reads device scalars (local_minimums_reached,incumbent_objective,violation_score) via.value(). Whilehost_copyat line 241 synchronizes, the.value()calls on climber state may access data that's still being written if the solve uses async operations internally.Consider adding an explicit sync after
fj.solve():Suggested fix
fj.solve(solution); + solution.handle_ptr->sync_stream(); auto solution_vector = host_copy(solution.assignment, solution.handle_ptr->get_stream());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/tests/mip/mip_utils.cuh` around lines 240 - 248, After calling fj.solve(solution) ensure the device work is complete before reading climber device scalars: insert an explicit synchronization on the stream used by the solve (use solution.handle_ptr->get_stream() or the same stream that fj.solve uses) immediately after fj.solve(solution) and before calling fj.climbers[0]->local_minimums_reached.value(...), fj.climbers[0]->incumbent_objective.value(...), and fj.climbers[0]->violation_score.value(...); this guarantees the device writes are finished (alternatively use the library's provided stream/device sync API) even if host_copy/assignment uses a different synchronization.cpp/src/mip_heuristics/diversity/recombiners/recombiner.cuh-248-257 (1)
248-257:⚠️ Potential issue | 🟡 MinorNondeterministic iteration order from
unordered_set.
std::unordered_setdoes not guarantee iteration order, so converting it directly to a vector (line 250) produces a nondeterministicenabled_recombinersorder. If recombiner execution order affects results (e.g., through solution selection or early termination), this could break determinism.Consider sorting
enabled_recombinersafter construction:🔧 Proposed fix
recombiner_t::enabled_recombiners = std::vector<recombiner_enum_t>(enabled_recombiners.begin(), enabled_recombiners.end()); + std::sort(recombiner_t::enabled_recombiners.begin(), + recombiner_t::enabled_recombiners.end()); cuopt_assert(!recombiner_t::enabled_recombiners.empty(), "No recombiners enabled after init");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/diversity/recombiners/recombiner.cuh` around lines 248 - 257, The enabled_recombiners vector is built from an unordered_set so its iteration order is nondeterministic; after creating recombiner_t::enabled_recombiners from enabled_recombiners, sort that vector deterministically (e.g., by the enum numeric value or by recombiner_name) so subsequent logic (recombiner_t::enabled_recombiners, the loop that builds order_str, and CUOPT_DETERMINISM_LOG) sees a stable order; ensure the sort happens immediately after the vector construction and before any use.
|
/ok to test c3f8fc3 |
This PR implements the integration of GPU heuristics into the deterministic codepath of the solver. This enables running full end-to-end solves with presolve, parallel B&B, and GPU heuristics, with full run-to-run determinism and heterogeneous (CPU+GPU) execution.
For this purpose, the GPU heuristics were modified to eliminate sources of nondeterminism such as non-ordered floating point reductions, atomic operations, scheduling-dependent operations... Extensive testing has been added to the CI runs to ensure individual components (such as probing, FJ, recombiners, FP...) are bitwise deterministic, and their overall orchestration maintains this invariant.
Furthermore, to reduce changes to the codebase and splits in the codepath, wall-clock timers on the heuristics side were replaced with unified termination checking objects, that check for wall time (opportunistic mode), or work total (deterministic mode).
These timers are hierarchical, and thus allow for checking against multiple terminations sources, for example a local algo's work budget, the global wall clock time limit, or in future PRs user-controlled termination signals like Ctrl-C or callback handlers.
B&B acts as the authority for incumbent publishing to the user. At every horizon sync, the queue of GPU heuristic solutions is checked, and if solutions whose timestamp is <= the current B&B work total, they are drained, replayed, and incorporated into the tree exploration.
A new get_solution_callback_ext callback has been introduced to prevent ABI breaks and pass more metadata with each published solution. For now, only the solution origin and its work unit timestamp are exposed, but additional fields may be later added without breaking the ABI.
The benchmark runner now relies on such callbacks to record the incumbents and their timestamps - it is no longer necessary to parse the logs to compute the primal integral.
A few other bugfixes have been included to the existing B&B deterministic exploration regarding node queues. Other fixes include uninitialized memory accesses, and improvements to allow running compute-sanitizer initcheck without false positives. Determinism logs have also been added thorought the code, disabled by default.
Some features remain unsupported in deterministic mode, such as Papilo's Probing presolver, concurrent root solve, RINS, or the sub-MIP recombiner. These will be incorporated in later PRs.
Furthermore, the PR has been modified to take the ML estimators out for the sake of reducing code review workload. They will be incorporated again in later PRs, along with improved tuning.
No intentional API or ABI breaking changes introduced.
Opportunistic mode performance remains unchanged:
Primal gap 12.5%, 226 feasible, primal integral 19.1%.
Deterministic mode benchmarks pending.
Closes #144
Closes #882