fix(proxy): slim oversized response.create history#560
Conversation
3c311bf to
2b3479d
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2b3479d4f3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| candidate_payload["input"] = [ | ||
| _response_create_history_omission_notice_item(omitted_count), | ||
| *remaining_historical, | ||
| *recent, | ||
| ] |
There was a problem hiding this comment.
Retry fit without notice when all history is dropped
When _omit_historical_response_input_items_to_fit removes every historical item, it still always prepends _response_create_history_omission_notice_item, and if that final payload is over max_bytes the function returns it anyway. This creates avoidable 413 payload_too_large failures for near-limit requests where recent alone would fit but notice + recent does not (for example, tiny historical items that are smaller than the notice text). Please add a final fallback that checks the recent-only payload before returning an oversized result.
Useful? React with 👍 / 👎.
|
Thanks @Komzpa -- the historical-item omission and P2 -- recent-only fallback when notice + recent still over budgetIn Suggested addition right before the function returns the final if _serialized_json_size(candidate_payload) > max_bytes:
# Final fallback: notice + recent itself is oversized; try recent
# alone. Drops the omission disclosure but lets the request go
# through instead of hard-failing the user-visible turn.
recent_only = dict(payload)
recent_only["input"] = list(recent)
if _serialized_json_size(recent_only) <= max_bytes:
return recent_only, omitted_countThe trade-off is that the user no longer sees the "omitted N items" notice, but they also do not get a 413 in a case where the request could have succeeded. If you'd rather keep the notice always and just log a warning + still 413, that's a defensible call too -- the current behavior is silently returning oversized. Quick regression to lock the contract: def test_slim_response_create_drops_notice_when_notice_plus_recent_oversized():
# tiny historical items, large enough recent + notice text combined > max_bytes,
# but recent alone <= max_bytes
# assert summary['historical_items_omitted'] == len(history)
# assert _serialized_json_size(slimmed_payload) <= max_bytes
# assert no notice item in slimmed_payload['input']Not a hard blocker -- if you'd rather keep this PR scoped to the original ask and ship as-is, fine, the current behavior is no worse than today's pre-PR 413. Up to you. The dump-dir move and the omission-pass core both look right. If you push the fallback I'll re-run |
|
To use Codex here, create an environment for this repo. |
|
@codex review |
1 similar comment
|
@codex review |
|
Codex Review: Didn't find any major issues. Hooray! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@all-contributors please add @Komzpa for code, test |
|
@codex review |
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
) This reverts merge commit e42af5e. The slimming approach rewrites conversation history on the codex-lb side (a stateless forwarding proxy) and conflicts with several stateful surfaces that have to stay consistent with the upstream view: - Prompt-cache invalidation: every slimmed turn changes the input prefix and forces a cache miss against the upstream prompt cache, silently re-billing input tokens. This intersects with #515's pin / affinity work and is invisible to the user. - Assistant-role omission notice ('[codex-lb omitted N items]'): the model treats this as its own prior speech and starts apologising for missing context, hallucinating that earlier history is known, or contradicting the trimmed turns. - WebSocket turn_state mismatch: the Codex WS path is stateful through previous_response_id. After slimming, codex-lb's request view diverges from the upstream session view; replays of the slimmed payload can reference items the upstream copy expected but we already trimmed. These are not knobs we can tune: history compaction belongs in the Codex client (or upstream Responses API), not in a forwarding proxy. The 413 -> slim transition is strictly better for the immediate symptom, but it shifts the failure mode from a clear hard fail into silent degradation of cache cost, model trust, and WS continuity. Reverting before this lands in a release. Tracking the broader follow-up in #568. The original 413 reporter (#556) reopens for now while we route this through Codex CLI / file-upload paths instead.
…575) The oversized response.create payload dump dir was hardcoded to `/var/lib/codex-lb/debug/response-create-dumps`. That path is correct inside the container image (where DEFAULT_HOME_DIR resolves to /var/lib/codex-lb), but on macOS `uv tool` / LaunchAgent installs the process has no writable access to /var/lib/codex-lb and the dump path silently fails when the proxy tries to write a guarded oversized payload (originally reported in #556). Resolve the dump dir from DEFAULT_HOME_DIR so the path tracks the deploy's data directory: - inside the container image: `/var/lib/codex-lb/debug/response-create-dumps` (unchanged) - macOS `uv tool` install: `~/.codex-lb/debug/response-create-dumps` This is the dump-path slice of the reverted #560 (#569). It does not include any of the history-slimming changes from #560 -- only the portable dump path, which has no transport-layer interaction and only fires when the proxy is *guarding* against an oversized request that already exceeds the upstream WebSocket frame budget. Verified against the dev container: DEFAULT_HOME_DIR resolves to `/var/lib/codex-lb`, dump_dir resolves to `/var/lib/codex-lb/debug/response-create-dumps`, smoke test on /v1/responses returns 'OK', and the existing test suite passes (1493 passed, 3 skipped, 4 warnings).
Summary
response.createslimming by omitting the oldest historical input items when image/tool-output slimming is still over the websocket budget.response.createdebug dumps under the default app home directory so local user installs use~/.codex-lbwhile containers keep/var/lib/codex-lb.Fixes #556.
Validation
openspec validate --specsuv run pytest tests/unit/test_proxy_utils.py -k 'slim_response_create or oversized_response_create_dump_dir' -quv run ruff check app/modules/proxy/service.py tests/unit/test_proxy_utils.pyuv run ty check app/modules/proxy/service.py tests/unit/test_proxy_utils.py