Skip to content

fix(proxy): slim oversized response.create history#560

Merged
Soju06 merged 6 commits intoSoju06:mainfrom
Komzpa:fix/response-create-history-slimming
May 9, 2026
Merged

fix(proxy): slim oversized response.create history#560
Soju06 merged 6 commits intoSoju06:mainfrom
Komzpa:fix/response-create-history-slimming

Conversation

@Komzpa
Copy link
Copy Markdown
Contributor

@Komzpa Komzpa commented May 6, 2026

Summary

  • Continue response.create slimming by omitting the oldest historical input items when image/tool-output slimming is still over the websocket budget.
  • Preserve the recent user-turn suffix and add a single assistant notice with the omitted item count.
  • Move oversized response.create debug dumps under the default app home directory so local user installs use ~/.codex-lb while containers keep /var/lib/codex-lb.

Fixes #556.

Validation

  • openspec validate --specs
  • uv run pytest tests/unit/test_proxy_utils.py -k 'slim_response_create or oversized_response_create_dump_dir' -q
  • uv run ruff check app/modules/proxy/service.py tests/unit/test_proxy_utils.py
  • uv run ty check app/modules/proxy/service.py tests/unit/test_proxy_utils.py

@Komzpa Komzpa force-pushed the fix/response-create-history-slimming branch from 3c311bf to 2b3479d Compare May 6, 2026 20:45
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented May 9, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2b3479d4f3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +9147 to +9151
candidate_payload["input"] = [
_response_create_history_omission_notice_item(omitted_count),
*remaining_historical,
*recent,
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Retry fit without notice when all history is dropped

When _omit_historical_response_input_items_to_fit removes every historical item, it still always prepends _response_create_history_omission_notice_item, and if that final payload is over max_bytes the function returns it anyway. This creates avoidable 413 payload_too_large failures for near-limit requests where recent alone would fit but notice + recent does not (for example, tiny historical items that are smaller than the notice text). Please add a final fallback that checks the recent-only payload before returning an oversized result.

Useful? React with 👍 / 👎.

@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented May 9, 2026

Thanks @Komzpa -- the historical-item omission and DEFAULT_HOME_DIR move both look right and #556 will close cleanly. Codex review landed one P2 finding I think is worth a small fix:

P2 -- recent-only fallback when notice + recent still over budget

In _omit_historical_response_input_items_to_fit, after every historical item has been removed the loop still prepends _response_create_history_omission_notice_item. If notice + recent is itself over max_bytes (small history items, large notice text, or already-large recent), the function returns the oversized payload anyway and the caller still 413s -- even though dropping the notice would have fit.

Suggested addition right before the function returns the final candidate_payload:

if _serialized_json_size(candidate_payload) > max_bytes:
    # Final fallback: notice + recent itself is oversized; try recent
    # alone. Drops the omission disclosure but lets the request go
    # through instead of hard-failing the user-visible turn.
    recent_only = dict(payload)
    recent_only["input"] = list(recent)
    if _serialized_json_size(recent_only) <= max_bytes:
        return recent_only, omitted_count

The trade-off is that the user no longer sees the "omitted N items" notice, but they also do not get a 413 in a case where the request could have succeeded. If you'd rather keep the notice always and just log a warning + still 413, that's a defensible call too -- the current behavior is silently returning oversized.

Quick regression to lock the contract:

def test_slim_response_create_drops_notice_when_notice_plus_recent_oversized():
    # tiny historical items, large enough recent + notice text combined > max_bytes,
    # but recent alone <= max_bytes
    # assert summary['historical_items_omitted'] == len(history)
    # assert _serialized_json_size(slimmed_payload) <= max_bytes
    # assert no notice item in slimmed_payload['input']

Not a hard blocker -- if you'd rather keep this PR scoped to the original ask and ship as-is, fine, the current behavior is no worse than today's pre-PR 413. Up to you. The dump-dir move and the omission-pass core both look right.

If you push the fallback I'll re-run @codex review and we can land it. Otherwise let me know and I'll go ahead and merge with the P2 as a known soft edge.

@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create an environment for this repo.

@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented May 9, 2026

@codex review

1 similar comment
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented May 9, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@Soju06 Soju06 merged commit e42af5e into Soju06:main May 9, 2026
18 checks passed
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented May 9, 2026

@all-contributors please add @Komzpa for code, test

@allcontributors
Copy link
Copy Markdown
Contributor

@Soju06

@Komzpa already contributed before to code, test

@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented May 9, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Soju06 added a commit that referenced this pull request May 9, 2026
)

This reverts merge commit e42af5e.

The slimming approach rewrites conversation history on the codex-lb side (a stateless forwarding proxy) and conflicts with several stateful surfaces that have to stay consistent with the upstream view:

- Prompt-cache invalidation: every slimmed turn changes the input prefix and forces a cache miss against the upstream prompt cache, silently re-billing input tokens. This intersects with #515's pin / affinity work and is invisible to the user.
- Assistant-role omission notice ('[codex-lb omitted N items]'): the model treats this as its own prior speech and starts apologising for missing context, hallucinating that earlier history is known, or contradicting the trimmed turns.
- WebSocket turn_state mismatch: the Codex WS path is stateful through previous_response_id. After slimming, codex-lb's request view diverges from the upstream session view; replays of the slimmed payload can reference items the upstream copy expected but we already trimmed.

These are not knobs we can tune: history compaction belongs in the Codex client (or upstream Responses API), not in a forwarding proxy. The 413 -> slim transition is strictly better for the immediate symptom, but it shifts the failure mode from a clear hard fail into silent degradation of cache cost, model trust, and WS continuity.

Reverting before this lands in a release. Tracking the broader follow-up in #568. The original 413 reporter (#556) reopens for now while we route this through Codex CLI / file-upload paths instead.
Soju06 added a commit that referenced this pull request May 10, 2026
…575)

The oversized response.create payload dump dir was hardcoded to
`/var/lib/codex-lb/debug/response-create-dumps`. That path is correct
inside the container image (where DEFAULT_HOME_DIR resolves to
/var/lib/codex-lb), but on macOS `uv tool` / LaunchAgent installs the
process has no writable access to /var/lib/codex-lb and the dump path
silently fails when the proxy tries to write a guarded oversized
payload (originally reported in #556).

Resolve the dump dir from DEFAULT_HOME_DIR so the path tracks the
deploy's data directory:
- inside the container image: `/var/lib/codex-lb/debug/response-create-dumps` (unchanged)
- macOS `uv tool` install: `~/.codex-lb/debug/response-create-dumps`

This is the dump-path slice of the reverted #560 (#569). It does not
include any of the history-slimming changes from #560 -- only the
portable dump path, which has no transport-layer interaction and only
fires when the proxy is *guarding* against an oversized request that
already exceeds the upstream WebSocket frame budget.

Verified against the dev container: DEFAULT_HOME_DIR resolves to
`/var/lib/codex-lb`, dump_dir resolves to
`/var/lib/codex-lb/debug/response-create-dumps`, smoke test on
/v1/responses returns 'OK', and the existing test suite passes
(1493 passed, 3 skipped, 4 warnings).
Soju06 added a commit that referenced this pull request May 10, 2026
Soju06 added a commit that referenced this pull request May 10, 2026
* chore(main): release 1.16.0

* docs(changelog): mark #560 reverted, fold #574 narrowing into #571 entry, add #569/#575
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large response.create payloads still 413 after slimming; dump path unwritable on macOS uv install

2 participants