fix(proxy): suppress duplicate side-effect tool calls by Komzpa · Pull Request #586 · Soju06/codex-lb

Komzpa · 2026-05-12T02:20:56Z

Summary

suppress same-response replayed side-effect tool-call events even when upstream changes call_id
rewrite duplicate nested operations inside multi_tool_use.parallel before forwarding the batch to the client
keep read-only calls and repeated operations under later response ids intact

Validation

uv run ruff check app/modules/proxy/service.py tests/unit/test_proxy_utils.py
uv run ty check app/modules/proxy/service.py tests/unit/test_proxy_utils.py
uv run pytest tests/unit/test_proxy_utils.py -q -k 'tool_call or stream_responses_keeps_distinct_http_tool_calls_across_response_ids or stream_responses_suppresses_same_response_http_tool_call_replay or parallel_tool'\n\nSplit from fix(proxy): trim Codex websocket full-replay continuations #555.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6a337e6aa9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Soju06 · 2026-05-12T05:28:27Z

Thanks @Komzpa -- the side-effect tool-call dedupe is exactly the right shape (per-response id cache, side-effect allowlist, multi_tool_use.parallel nested rewriting, distinct read-only / later-response-id calls intact), and the OpenSpec change folder is in place. The 566 lines of unit coverage are also a great safety net. Codex review picked up two issues that should land before merging.

P1 -- `wait_agent` dedupe must not raise on malformed targets

app/modules/proxy/service.py:~8653 calls sorted(targets) on tool-call payload that is model-generated JSON. If the model emits functions.wait_agent with mixed-type targets (or any otherwise non-comparable list), sorted raises TypeError and aborts the whole stream-processing path. That converts what's supposed to be a best-effort dedupe into a hard stream failure on input the proxy has no control over.

Suggested fix: feature-detect comparability, or just sort safely with a key that's always comparable, then fall back to the original list on failure:

def _normalize_wait_agent_targets(targets):
    if not isinstance(targets, list):
        return targets
    try:
        return sorted(targets, key=lambda t: (t.__class__.__name__, str(t)))
    except (TypeError, ValueError):
        return list(targets)  # fall through; dedupe is best-effort

And a regression to pin behavior:

def test_wait_agent_dedupe_tolerates_mixed_type_targets():
    parallel = {
        "call_id": "call_x",
        "name": "multi_tool_use.parallel",
        "arguments": json.dumps({
            "tool_uses": [
                {"recipient_name": "functions.wait_agent", "parameters": {"targets": [1, "a", {}]}}
            ]
        }),
    }
    # should not raise, should forward the event unchanged (or at worst with a no-op dedupe)
    result = _maybe_dedupe_parallel_tool_use(parallel, dedupe_cache={})
    assert result is not None

P2 -- normalize top-level `exec_command` / `write_stdin` keys before dedupe

The top-level side-effect path keys on the raw arguments string:

# app/modules/proxy/service.py:~8619
dedupe_key = (call_name, arguments_str)

while the nested multi_tool_use.parallel path already strips volatile fields (yield_time_ms, max_output_tokens, etc.) before keying. That asymmetry defeats the dedupe in the exact replay scenario this PR is targeting:

Same response_id, same exec_command body, but a replayed event with yield_time_ms: 5000 vs yield_time_ms: 6000 -> two distinct keys -> both events forwarded -> command executed twice.

Suggested fix: share the volatile-field-stripping helper between the top-level and nested paths, and key on the normalized arguments dict (or its canonical-JSON dump):

_SIDE_EFFECT_VOLATILE_ARG_KEYS = frozenset({"yield_time_ms", "max_output_tokens", "timeout_ms"})

def _side_effect_dedupe_key(call_name: str, arguments: dict) -> str:
    operative = {k: v for k, v in arguments.items() if k not in _SIDE_EFFECT_VOLATILE_ARG_KEYS}
    return f"{call_name}::{json.dumps(operative, sort_keys=True, separators=(',', ':'))}"

Then both top-level (exec_command/write_stdin) and the parallel inner-dedupe use the same function.

Test to pin:

def test_top_level_exec_command_dedupes_across_yield_time_ms_differences():
    # same response_id, two events with identical exec_command `command` but different `yield_time_ms`
    state = _make_dedupe_state(response_id="resp_a")
    first = _exec_event(call_id="call_a", command="echo hi", yield_time_ms=5000)
    second = _exec_event(call_id="call_b", command="echo hi", yield_time_ms=6000)
    assert _consume_side_effect_event(first, state) is not None      # forwarded
    assert _consume_side_effect_event(second, state) is None         # suppressed

Not blocking otherwise

Scope (split from fix(proxy): trim Codex websocket full-replay continuations #555) is clean.
Cache limit (_WEBSOCKET_TOOL_CALL_DEDUPE_CACHE_LIMIT = 1024) is reasonable for per-response keying and won't grow unbounded.
Allowlist for side-effect tool names is conservative -- read-only calls and later-response ids are still passed through.

After the P1 fix and (ideally) the P2 normalization land, I'll re-run @codex review and queue for merge.

chatgpt-codex-connector · 2026-05-12T05:28:39Z

To use Codex here, create an environment for this repo.

Fixes Soju06#565. Long-running Codex agent sessions stop mid-task when the upstream replies with "Our servers are currently overloaded. Please try again later" because classify_upstream_failure returns non_retryable for that envelope. OpenAI surfaces the overload condition with error.code = "overloaded_error". That code is missing from _TRANSIENT_CODES, and the envelope can arrive without an accompanying 5xx HTTP status (streamed Responses API traffic typically returns HTTP 200 before the error envelope hits the wire). With both signals absent the classifier falls through to non_retryable, and failover_decision only retries / fails over for retryable_transient, rate_limit, and quota. The request is failed back to the client and the agent stops without exercising the multi-account failover path that already exists in the load balancer. Add "overloaded_error" to _TRANSIENT_CODES so the classifier returns retryable_transient regardless of the HTTP status, and pin the behavior with a unit regression that uses the no-status shape. Document the requirement under the responses-api-compat capability since /v1/responses is the surface where this overload shape is most commonly observed.

(cherry picked from commit dc1bcb7)

(cherry picked from commit 5c74ede)

(cherry picked from commit 5f323ef)

(cherry picked from commit 994f2a0)

(cherry picked from commit b7f7a64)

(cherry picked from commit 858d0d7)

(cherry picked from commit 0dcd7c7)

(cherry picked from commit f9503fe)

(cherry picked from commit 99119a0)

(cherry picked from commit ec174a1)

…fix/suppress-duplicate-side-effect-tool-calls-rebuild

This was referenced May 12, 2026

fix(proxy): trim Codex websocket full-replay continuations #555

Open

Tested live aggregate stack and suggested merge order #554

Open

chatgpt-codex-connector Bot reviewed May 12, 2026

View reviewed changes

Comment thread app/modules/proxy/service.py Outdated

Comment thread app/modules/proxy/service.py Outdated

Komzpa force-pushed the fix/suppress-duplicate-side-effect-tool-calls branch from 6a337e6 to 5911be5 Compare May 12, 2026 02:43

Komzpa force-pushed the fix/suppress-duplicate-side-effect-tool-calls branch 4 times, most recently from 40ac23e to 85666c1 Compare May 12, 2026 18:49

ozpool and others added 20 commits May 13, 2026 13:17

fix(proxy): trim websocket codex full-replay continuations

f2e2ae0

(cherry picked from commit dc1bcb7)

fix(proxy): trim replayed websocket tool calls

a3b57a2

(cherry picked from commit 5c74ede)

fix(proxy): trim anchored http bridge tool calls

69e4594

(cherry picked from commit 5f323ef)

fix(proxy): trim anchored tool calls at serialization

250db33

(cherry picked from commit 994f2a0)

fix(proxy): harden websocket replay continuity

6c3e665

(cherry picked from commit b7f7a64)

fix(proxy): preserve client previous-response replay context

2bafab3

(cherry picked from commit 858d0d7)

fix(proxy): harden websocket replay trimming

7f2d3f3

(cherry picked from commit 0dcd7c7)

fix(proxy): ignore stale bridge clean close after completed response

f68cdb2

(cherry picked from commit f9503fe)

fix(proxy): retry stale websocket anchors from full replay

4210713

(cherry picked from commit 99119a0)

fix(proxy): recover http bridge stale anchors from replay

2354bb2

(cherry picked from commit ec174a1)

fix(proxy): repair websocket replay retry regressions

9891626

Merge remote-tracking branch 'refs/remotes/codex-lb-prs/pr-601' into …

b23456d

…fix/suppress-duplicate-side-effect-tool-calls-rebuild

Merge remote-tracking branch 'refs/remotes/codex-lb-prs/pr-555' into …

0323e7a

…fix/suppress-duplicate-side-effect-tool-calls-rebuild

fix(proxy): suppress duplicate side-effect tool calls

36d0f5a

fix(proxy): mask missing tool output continuity errors

4f072e7

fix(proxy): fail closed suppressed stream tool calls

849481c

fix(proxy): keep read-only parallel tool calls

3611a84

fix(proxy): trim overlapping parallel tool replays

9a58e99

fix(proxy): propagate trimmed parallel tool events

bdd1a91

Komzpa added 2 commits May 13, 2026 15:03

fix(proxy): retire unsafe bridge after tool dedupe

f3ccb32

fix(proxy): preserve malformed tool input errors

df2d161

Komzpa force-pushed the fix/suppress-duplicate-side-effect-tool-calls branch from 7beee93 to df2d161 Compare May 13, 2026 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(proxy): suppress duplicate side-effect tool calls#586

fix(proxy): suppress duplicate side-effect tool calls#586
Komzpa wants to merge 22 commits into
Soju06:mainfrom
Komzpa:fix/suppress-duplicate-side-effect-tool-calls

Komzpa commented May 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Soju06 commented May 12, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Komzpa commented May 12, 2026

Summary

Validation

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Soju06 commented May 12, 2026

P1 -- wait_agent dedupe must not raise on malformed targets

P2 -- normalize top-level exec_command / write_stdin keys before dedupe

Not blocking otherwise

Uh oh!

chatgpt-codex-connector Bot commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

P1 -- `wait_agent` dedupe must not raise on malformed targets

P2 -- normalize top-level `exec_command` / `write_stdin` keys before dedupe