Skip to content

Standalone Activity: preserve server-generated request IDs across restarts#9724

Open
dandavison wants to merge 2 commits intomainfrom
saa-retain-request-ids
Open

Standalone Activity: preserve server-generated request IDs across restarts#9724
dandavison wants to merge 2 commits intomainfrom
saa-retain-request-ids

Conversation

@dandavison
Copy link
Copy Markdown
Contributor

@dandavison dandavison commented Mar 27, 2026

What changed?

When generating a request ID server-side, set it on the request struct before any cloning so that the mutation is re-used by all retries.

Why?

Without this, there is a bug, although I have not attempted to repro it:

  1. Request arrives at Frontend without requestID
  2. Inside retry interceptor loop, requestID is generated and set on a cloned copy
  3. Request proceeds to history, starts the execution, but then some networking condition in the cell causes RetryableInterceptor not to receive the Ack (it sees a context expiry)
  4. Frontend retries, generating a new request ID. But meanwhile the activity has completed. This would be rare, but technically possible.
  5. The default reuse policy permits a second execution to be started. This would be a bug: the second start should have been prevented by the request ID. If the user's activity lacks idempotency protection it will lead to incorrectness in the user's systems.

How did you test it?

It would be hard to repro that bug.

  • built
  • added weak new unit test(s) for the Start case.

Potential risks

Could introduce incorrectness into Standalone Activity


Note

Medium Risk
Touches standalone activity request-id generation/dedup behavior; mistakes could affect idempotency (duplicate starts/cancels/terminations) but the change is small and covered by a unit test for the start path.

Overview
Ensures standalone activity frontend operations re-use a single RequestId across retry attempts by generating the ID on the original request before cloning/mutation in validateAndPopulateStartRequest, and by no longer cloning requests when auto-populating RequestId for TerminateActivityExecution and RequestCancelActivityExecution.

Adds a unit test (TestRequestIdStableAcrossRetries) that calls validateAndPopulateStartRequest twice with the same request pointer and asserts the RequestId stays stable for both server-generated and client-provided IDs.

Written by Cursor Bugbot for commit 38bd190. This will update automatically on new commits. Configure here.

@dandavison dandavison requested a review from a team as a code owner March 27, 2026 16:44
@dandavison dandavison requested review from fretz12 and stephanos March 27, 2026 16:50
Calls validateAndPopulateStartRequest twice with the same request
pointer (simulating RetryableInterceptor retry behavior) and asserts
both clones carry the same server-generated RequestId.

Fails without the fix in the preceding commit: each clone got a fresh
UUID because the RequestId was generated after the clone.
Move RequestId generation before the proto clone so that the
RetryableInterceptor, which retries with the same request pointer,
carries the same server-generated RequestId on every attempt. Previously
the RequestId was generated on the clone, so each retry got a fresh UUID
that could not match the stored RequestId, breaking dedup and potentially
causing a double-create if the activity completed between attempts.

Fix Terminate/CancelActivity request-ID dedup across server-side retries

Same issue as StartActivityExecution: the RequestId was generated on a
clone, so RetryableInterceptor retries got a fresh UUID each time. For
Terminate, this caused a spurious FailedPrecondition error on retry
("already terminated with request ID ..."). For RequestCancel, same
pattern ("cancellation already requested with request ID ...").

Fix: set RequestId on the original request before any clone, matching
the OSS pattern used by workflow handlers.

These handlers had no other mutations requiring a clone, so the clone is
simply removed.
@dandavison dandavison force-pushed the saa-retain-request-ids branch from f39f920 to 38bd190 Compare March 27, 2026 16:55

if req.GetRequestId() == "" {
// Since this mutates the request, we clone it first so that any retries use the original request.
req = common.CloneProto(req)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, in my PR now I've moved this into the validation and normalization methods like this:

    if req.GetRequestId() == "" {
		req.RequestId = uuid.NewString()
	} else if len(req.GetRequestId()) > config.MaxIDLengthLimit() {
		return serviceerror.NewInvalidArgumentf("request_id exceeds length limit. Length=%d Limit=%d",
			len(req.GetRequestId()), config.MaxIDLengthLimit())
	}

Copy link
Copy Markdown
Member

@bergundy bergundy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FTR, the duplicates can easily happen if a client retries and doesn't specify a request ID. This makes it a bit worse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants