Skip to content

NodeAgent onboarding: send register sets to coordinator at startup#104

Open
1ntEgr8 wants to merge 1 commit intousers/elton/telemetry-infrastructurefrom
users/elton/nodeagent-onboarding
Open

NodeAgent onboarding: send register sets to coordinator at startup#104
1ntEgr8 wants to merge 1 commit intousers/elton/telemetry-infrastructurefrom
users/elton/nodeagent-onboarding

Conversation

@1ntEgr8
Copy link
Copy Markdown
Collaborator

@1ntEgr8 1ntEgr8 commented Feb 27, 2026

Summary

  • NodeAgents send per-device register set info to Coordinator during onboarding
  • New OnboardNodeAgentRequest/Response messaging types
  • Coordinator forwards register sets to Planner for register-aware planning
  • Add RegisterSet::Uniform constructor and serialization support

@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from 59cf112 to 8afcfbe Compare February 27, 2026 02:34
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch from 4f14857 to 566631e Compare February 27, 2026 02:34
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from 8afcfbe to c2fc1b8 Compare February 27, 2026 15:20
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch from 566631e to c767906 Compare February 27, 2026 15:20
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from c2fc1b8 to 24089c6 Compare February 27, 2026 15:23
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch from c767906 to 5ce2cbf Compare February 27, 2026 15:23
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from 24089c6 to 253705b Compare February 27, 2026 15:26
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch from 5ce2cbf to d854f01 Compare February 27, 2026 15:26
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from 253705b to 75615e9 Compare February 27, 2026 15:49
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch from d854f01 to baa3a30 Compare February 27, 2026 15:49
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from 75615e9 to c9b9866 Compare February 27, 2026 15:55
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch from baa3a30 to 79fb5d8 Compare February 27, 2026 15:55
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch 3 times, most recently from c9b9866 to 513c2e4 Compare February 27, 2026 16:10
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch from 79fb5d8 to cb2b937 Compare February 27, 2026 16:12
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from 513c2e4 to 785abed Compare February 27, 2026 16:20
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch from cb2b937 to ced7715 Compare February 27, 2026 16:20
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from 785abed to 934ddf6 Compare February 27, 2026 17:26
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch from ced7715 to d0a76a6 Compare February 27, 2026 17:26
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from 934ddf6 to 6891e02 Compare February 27, 2026 17:27
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch from d0a76a6 to 60fc949 Compare February 27, 2026 17:29
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from 6891e02 to a9ca420 Compare February 27, 2026 17:32
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch 2 times, most recently from ce50a07 to 0638f2d Compare February 27, 2026 18:44
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch 2 times, most recently from a9ca420 to 1e612fe Compare February 27, 2026 18:51
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch from 0638f2d to 158e273 Compare February 27, 2026 18:51
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from 1e612fe to fb5e8ec Compare February 27, 2026 19:08
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch from 158e273 to 1acf837 Compare February 27, 2026 19:08
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from fb5e8ec to a4d12f2 Compare February 27, 2026 20:32
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch from 1acf837 to 398c69c Compare February 27, 2026 20:35
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from a4d12f2 to 4efae8a Compare February 27, 2026 20:57
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch 2 times, most recently from d3bdad3 to 8727bd7 Compare February 27, 2026 20:57
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from 4efae8a to 3994c0b Compare February 27, 2026 20:59
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch from 8727bd7 to e0f801b Compare February 27, 2026 20:59
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from 3994c0b to 3aeb805 Compare February 27, 2026 21:06
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch 2 times, most recently from 661f17d to e0f801b Compare February 27, 2026 21:06
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from 3aeb805 to 3994c0b Compare February 27, 2026 21:06
NodeAgents now send their per-device register set information to the
Coordinator during onboarding. This enables the Planner to allocate
intermediate registers at the correct sizes for multi-hop routing.

Key changes:
- Add OnboardNodeAgentRequest/Response messaging
- Add RegisterSet with uniform factory and per-device tracking
- NodeAgent constructs register sets from worker devices and sends
  them during onboarding handshake
- Coordinator Handler receives and passes sets to Planner
- Planner stores per-participant register sets for backend use
- Update NCCL backend to use RegisterSet for buffer allocation
- Fix data race: route onboarding through Executor thread via
  ExecutorTask variant so Planner is only accessed from one thread
@1ntEgr8 1ntEgr8 force-pushed the users/elton/telemetry-infrastructure branch from 3994c0b to 031bcf9 Compare February 27, 2026 21:11
@1ntEgr8 1ntEgr8 force-pushed the users/elton/nodeagent-onboarding branch from e0f801b to 5520148 Compare February 27, 2026 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant