Skip to content

[benchmarking] Add configurable GPU memory warning threshold#1966

Draft
rlratzel wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
rlratzel:gpu_warning_threshold
Draft

[benchmarking] Add configurable GPU memory warning threshold#1966
rlratzel wants to merge 1 commit into
NVIDIA-NeMo:mainfrom
rlratzel:gpu_warning_threshold

Conversation

@rlratzel
Copy link
Copy Markdown
Contributor

Summary

  • Adds max_allowed_gpu_mem_use_warning_threshold at session and per-entry level so pre-run GPU warnings only fire when memory usage exceeds a configured fraction of total GPU memory (0.0-1.0)
  • Collects warnings from both pre- and post-run GPU checks into result_data["warnings"] so sinks (e.g. SlackSink) read them directly instead of re-deriving them from raw GPU stats
  • SlackSink decoupled from GPU warning logic

Test plan

  • Run benchmarks with max_allowed_gpu_mem_use_warning_threshold set in the session config and confirm warnings only fire when GPU usage exceeds the threshold
  • Confirm result_data["warnings"] is populated and appears correctly in the Slack sink message
  • Confirm no warnings are emitted when GPU memory usage is below the threshold
  • Run without the field set and confirm behavior is unchanged (any usage > 0 triggers a warning)

🤖 Generated with Claude Code

Adds `max_allowed_gpu_mem_use_warning_threshold` at session and entry
level so pre-run GPU warnings only fire when memory usage exceeds a
configured fraction of total GPU memory. Warnings are collected from
both pre- and post-run checks and stored in `result_data["warnings"]`
so sinks (e.g. SlackSink) can read them directly instead of
re-deriving them from raw GPU stats.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: rlratzel <rratzel@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 11, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant