Skip to content

Add Nexus attach-callbacks action to throughput stress#350

Open
prathyushpv wants to merge 14 commits into
mainfrom
ppv/nexusAttachCallback
Open

Add Nexus attach-callbacks action to throughput stress#350
prathyushpv wants to merge 14 commits into
mainfrom
ppv/nexusAttachCallback

Conversation

@prathyushpv
Copy link
Copy Markdown
Contributor

@prathyushpv prathyushpv commented May 12, 2026

What was changed

Add Nexus attach-callbacks action to throughputstress similar to bench-go.

Why?

To exercise this feature in load test and make omes throughtput stress on par with bench-go.

@prathyushpv prathyushpv requested review from a team as code owners May 12, 2026 17:10
@stephanos stephanos self-requested a review May 12, 2026 17:35
Restructure along the lines of canary-go #332: instead of overloading the
echo path with attach-callbacks concerns, split the responsibilities.

- Drop wait_for_signal and workflow_id_override from NexusHandlerInput;
  revert NexusHandlerWorkflow and EchoAsyncOperation to their pre-PR shape
- Add a dedicated attach-to-workflow Nexus operation with its own backing
  workflow (NexusAttachHandlerWorkflow) that blocks on unblock and returns
  its RunID
- Add NexusAttachHandlerInput / NexusAttachHandlerOutput proto messages;
  drop the unused input field from ExecuteNexusOperationAttachCallbacks
- Orchestrator now asserts RunID equality across all N completed ops to
  verify callbacks coalesced onto a single backing workflow run
- Tighten Python signal-not-found guard to anchor on the literal core
  message prefix and include the underlying error in the warning log
# Conflicts:
#	loadgen/kitchensink/kitchen_sink.pb.go
#	scenarios/throughput_stress.go
#	workers/dotnet/Temporalio.Omes/protos/KitchenSink.cs
#	workers/go/worker/worker.go
#	workers/java/io/temporal/omes/KitchenSink.java
#	workers/python/protos/kitchen_sink_pb2.py
#	workers/ruby/protos/kitchen_sink/kitchen_sink_pb.rb
@prathyushpv prathyushpv force-pushed the ppv/nexusAttachCallback branch from ba2b5e1 to 8257931 Compare May 15, 2026 18:03
Replace the dedicated NexusOperationAttachCallbacks action variant + handler workflow
with a composition of existing primitives. The throughput-stress attach-callbacks
test is now expressed as a concurrent ActionSet of N echo-async ExecuteNexusOperation
actions, all targeting the same handler_workflow_id with USE_EXISTING conflict policy.
Each handler workflow runs a short timer in before_actions so callbacks have time to
attach before completion.

Proto changes:
- Add handler_workflow_id + handler_workflow_id_conflict_policy to ExecuteNexusOperation
  and NexusHandlerInput
- Remove ExecuteNexusOperationAttachCallbacks, NexusAttachHandlerInput,
  NexusAttachHandlerOutput

Worker changes (Go and Python, the only languages currently wiring Nexus):
- echo-async start handler honors the new fields when set
- Remove dedicated attach-to-workflow operation and NexusAttachHandler workflow
- Remove bespoke handleNexusOperationAttachCallbacks coordination function

Tradeoff: this drops the RunID coalescing assertion the previous version had. The
test now matches bench-go's coverage (a smoke test that exercises the USE_EXISTING
path without verifying coalescing).
Proto-generated WorkflowIdConflictPolicy is an int subclass at runtime but mypy
treats it as the proto enum class (no SupportsInt). Add an explicit typing.cast
so constructing temporalio.common.WorkflowIDConflictPolicy from the proto value
type-checks.
…rimitives

Two narrow additions to the kitchen-sink DSL let the throughputstress
attach-callbacks test express bench-go-equivalent signal coordination as pure
composition rather than a bespoke per-test worker function:

- AwaitableChoice.wait_started: await the command's STARTED event, then
  return without awaiting completion or cancelling.
- Action.await_workflow_completion: wait for an external workflow to complete
  by ID, backed by a wait_for_workflow activity that calls
  client.GetWorkflow().Get(). No parent-child relationship with the target.

NexusHandlerInput / ExecuteNexusOperation also gain a wait_for_signal flag so
the handler workflow holds open (on the 'unblock' signal) while concurrent
callbacks attach via USE_EXISTING.

The throughputstress attach-callbacks action now composes as:
  sequential {
    concurrent { 3x ExecuteNexusOperation(wait_started, USE_EXISTING X) }
    SendSignal X unblock
    AwaitWorkflowCompletion X
  }

No timer, no bespoke per-test worker logic. Both primitives are generic and
reusable for future tests.
The kitchen_sink proto has had SendSignalAction since the original design, but
neither Go nor Python's handleAction had a dispatch branch for it (no existing
scenario used SendSignal in a workflow action). The new attach-callbacks
composition emits SendSignal as part of its action set, so handleAction now
needs to handle it.

Go uses workflow.SignalExternalWorkflow with AwaitableChoice; Python uses
workflow.get_external_workflow_handle().signal().
When an action uses AwaitableChoice.wait_started, the dispatcher now retains
the future in workflow state instead of abandoning it. A new AwaitPendingActions
action drains the state, awaiting each pending future.

Works for any awaitable (timer, activity, child workflow, signal, Nexus op).
Removes the wait_for_workflow activity and the AwaitWorkflowCompletion proto
message — AwaitPendingActions covers the use case without the activity hop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant