Skip to content

Feature/hw health machine metadata#1671

Merged
mkoci merged 7 commits into
NVIDIA:mainfrom
mkoci:feature/hw-health-machine-metadata
May 14, 2026
Merged

Feature/hw health machine metadata#1671
mkoci merged 7 commits into
NVIDIA:mainfrom
mkoci:feature/hw-health-machine-metadata

Conversation

@mkoci
Copy link
Copy Markdown
Contributor

@mkoci mkoci commented May 14, 2026

Adds machine placement and NVLink metadata to the hardware health pipeline.

  • Carries machine metadata from inventory/API and static endpoint config into health event context
  • Exposes metadata as Prometheus labels and OTLP resource attributes

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

@mkoci mkoci requested review from a team and Coco-Ben as code owners May 14, 2026 03:53
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 14, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@mkoci mkoci requested a review from yoks May 14, 2026 03:53
@yoks
Copy link
Copy Markdown
Contributor

yoks commented May 14, 2026

Can you squash/resign commits. So DCO will pass and we can run tests properly

Comment thread crates/health/benches/collector_pipeline.rs
@mkoci mkoci force-pushed the feature/hw-health-machine-metadata branch from 06ffd6f to 9d07d4c Compare May 14, 2026 11:04
@yoks
Copy link
Copy Markdown
Contributor

yoks commented May 14, 2026

I'm fine the changes, but before we can merge we need all your commits to have verified signatures

@mkoci mkoci force-pushed the feature/hw-health-machine-metadata branch 2 times, most recently from 7283f8c to 8db0690 Compare May 14, 2026 17:36
mkoci added 6 commits May 14, 2026 14:20
Signed-off-by: mkoci <mkoci@nvidia.com>
Signed-off-by: mkoci <mkoci@nvidia.com>
Signed-off-by: mkoci <mkoci@nvidia.com>
Signed-off-by: mkoci <mkoci@nvidia.com>
Signed-off-by: mkoci <mkoci@nvidia.com>
Signed-off-by: mkoci <mkoci@nvidia.com>
@mkoci mkoci force-pushed the feature/hw-health-machine-metadata branch from 8db0690 to 5300426 Compare May 14, 2026 18:20
@github-actions
Copy link
Copy Markdown

@mkoci
Copy link
Copy Markdown
Contributor Author

mkoci commented May 14, 2026

I have fixed signing key in my new dev env

Also documented with an issue -> #1692

@mkoci mkoci linked an issue May 14, 2026 that may be closed by this pull request
2 tasks
@mkoci mkoci requested a review from yoks May 14, 2026 18:38
@mkoci mkoci merged commit 1c95a1d into NVIDIA:main May 14, 2026
45 checks passed
jayzhudev pushed a commit to jayzhudev/ncx-infra-controller-core that referenced this pull request May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature(health): Missing placement data for Machine and Switch

3 participants