Skip to content

feat: add HITL pause primitives to workflow engine#829

Open
wolo-lab wants to merge 1 commit into
wolo/workflowsfrom
wolo/workflow_hitl_pause
Open

feat: add HITL pause primitives to workflow engine#829
wolo-lab wants to merge 1 commit into
wolo/workflowsfrom
wolo/workflow_hitl_pause

Conversation

@wolo-lab
Copy link
Copy Markdown

@wolo-lab wolo-lab commented May 13, 2026

Adds the engine-side scaffolding for human-in-the-loop pauses. A node can now
emit a session.RequestInput event; the scheduler parks the node in
NodeWaiting, persists the request on NodeState.PendingRequest, and stops
scheduling its successors instead of finalising the run. Pause-only; resume
(Workflow.Resume, schema validation, handoff/re-entry modes) lands in
follow-up PRs.

API additions: session.RequestInput, session.Event.RequestedInput,
workflow.NewRequestInputEvent, workflow.NodeState.PendingRequest,
workflow.ErrMultipleInputRequests (one request per activation).

Mirrors the existing Routes/setRoutingEvent dispatch pattern in the
scheduler. The waiting branch in handleCompletion runs after error/cancel,
so a node that requested input and then errored lands in NodeFailed, not
NodeWaiting.

@wolo-lab wolo-lab force-pushed the wolo/workflow_hitl_pause branch 4 times, most recently from 652b345 to 6b8ac3d Compare May 13, 2026 21:06
@wolo-lab wolo-lab changed the base branch from wolo/workflow_engine to wolo/workflows May 13, 2026 21:16
@wolo-lab wolo-lab force-pushed the wolo/workflow_hitl_pause branch from 6b8ac3d to c499ec0 Compare May 13, 2026 22:05
Adds the engine-side scaffolding for human-in-the-loop pauses: a
workflow node can now emit a session.RequestInput, and the
scheduler will park the node in NodeWaiting and stop scheduling its
successors instead of finalising the run. This is the pause half of
HITL only; the resume half (Workflow.Resume, the agent.Agent
wrapper, schema validation, handoff vs. re-entry modes) is left for
follow-up PRs.

Type names and field names are aligned with adk-python's
RequestInput in src/google/adk/events/request_input.py:
interrupt_id / message / response_schema / payload.

API additions:
* session.RequestInput carries the prompt: InterruptID (stable
  correlation key), Message (UI text), ResponseSchema (reserved for
  the future validator), Payload (opaque UI context).
* session.Event.RequestedInput field, parallel to Routes; populated
  by the node, consumed by the scheduler and forwarded to the UI
  surface unchanged.
* workflow.NewRequestInputEvent(ctx, req) constructor, including
  UUID auto-generation when InterruptID is empty.
* workflow.NodeState.PendingRequest, persisted on the per-node
  state when the waiting branch fires.
* workflow.ErrMultipleInputRequests sentinel for the
  single-request-per-activation invariant.

Scheduler changes:
* nodeRun gains an inputRequest field with a setInputRequest
  method that mirrors the existing setRoutingEvent / setOutput
  pattern.
* handleEvent dispatches on ev.RequestedInput exactly parallel to
  the existing dispatch on ev.Routes.
* handleCompletion gains a waiting branch checked AFTER the
  error/cancel branches: a clean activation that recorded a
  request transitions to NodeWaiting, persists the request on
  NodeState, and skips successor scheduling. Failures take
  precedence so a node that recorded a request and then errored
  out lands in NodeFailed, not NodeWaiting.
* The scheduler.run loop is unchanged: it terminates when the
  runsByName map empties, which now happens when every live node
  has either completed or moved into NodeWaiting.

Tests:
* TestScheduler_HitlNode_PausesAndForwardsRequest pins the
  happy-path single-waiting-node behaviour.
* TestScheduler_HitlNode_AutoGeneratesInterruptID and
  TestScheduler_HitlNode_PreservesExplicitInterruptID lock in the
  InterruptID contract.
* TestScheduler_HitlNode_MultipleRequestsFails surfaces
  ErrMultipleInputRequests at completion.
* TestScheduler_HitlNode_ErrorAfterRequestFails pins the
  fail-over-park precedence in handleCompletion.
* TestScheduler_HitlNode_ConcurrentBranches_PausesOnlyWhenAllNonRunning
  exercises a parallel-branch graph: the non-HITL branch finishes
  normally while the HITL branch parks; the workflow ends only
  when both reach a terminal state.

All existing workflow tests still pass; the suite is race-free
under go test -race.
@wolo-lab wolo-lab force-pushed the wolo/workflow_hitl_pause branch from c499ec0 to a1543d7 Compare May 14, 2026 08:27
@wolo-lab wolo-lab changed the title workflow: add HITL pause primitives (NodeInputRequest, waiting branch) feat: add HITL pause primitives to workflow engine May 14, 2026
@wolo-lab wolo-lab requested review from anFatum and hanorik May 14, 2026 11:41
@wolo-lab wolo-lab marked this pull request as ready for review May 14, 2026 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants