Skip to content

feat(observability): add conversation archive viewer#519

Open
Komzpa wants to merge 4 commits intoSoju06:mainfrom
Komzpa:codex/conversation-archive-ui
Open

feat(observability): add conversation archive viewer#519
Komzpa wants to merge 4 commits intoSoju06:mainfrom
Komzpa:codex/conversation-archive-ui

Conversation

@Komzpa
Copy link
Copy Markdown
Contributor

@Komzpa Komzpa commented Apr 29, 2026

Summary

  • add opt-in gzip JSONL archive for upstream Codex request/response, compact, SSE, and websocket traffic
  • redact credential-bearing and token-like headers while keeping payloads available for audit/export
  • write gzip records via a bounded background writer; if the queue saturates, apply synchronous write backpressure instead of growing memory without limit
  • store archive output in hourly YYYY-MM-DDTHH.jsonl.gz files to limit live blast radius and keep recovery/scans bounded
  • append each record as a complete gzip member and recover a readable prefix from a corrupt tail once per active file before appending
  • treat ENOSPC, disk quota, and sqlite-style disk-full failures as archive disk pressure: pause archive writes briefly, drop archive records during the pause, and emit a rate-limited warning without traceback spam
  • preserve non-ASCII payload text as readable UTF-8 in JSONL, not Unicode escape soup
  • read archives through a threadpool so archive scans do not block the request event loop
  • expose archived payloads inside the existing Dashboard request log detail flow, keyed by exact request id across gzip and legacy JSONL archive files

Review notes

  • this PR is a single clean commit on top of upstream main; neighboring PRs fix(proxy): mask single previous response misses #516/fix(types): clear existing ty diagnostics #517 are only included in local custom runtime images, not in this PR diff
  • the request detail archive query does not poll; it loads when the details dialog asks for a request id
  • the writer queue is bounded at 4096 records; under sustained disk stalls the proxy backpressures on synchronous archive writes rather than leaking memory or silently dropping records
  • gzip readability/recovery is intentionally not re-run before every append; it is checked once per process/file to avoid repeatedly scanning a hot archive during live traffic
  • when the filesystem is full, archive writes pause for a cooldown instead of repeatedly queueing work and logging tracebacks; after the cooldown the next archive write retries normally

Verification

  • /home/kom/proj/codex-lb/.venv/bin/python -m pytest tests/unit/test_conversation_archive.py -q (14 passed)
  • /home/kom/proj/codex-lb/.venv/bin/ruff check app/core/conversation_archive.py tests/unit/test_conversation_archive.py
  • /home/kom/proj/codex-lb/.venv/bin/ruff format --check app/core/conversation_archive.py tests/unit/test_conversation_archive.py
  • git diff --check
  • combined release tree with fix(proxy): mask single previous response misses #516/fix(types): clear existing ty diagnostics #517/feat(observability): add conversation archive viewer #519: /home/kom/proj/codex-lb/.venv/bin/python -m pytest tests/unit/test_conversation_archive.py tests/unit/test_proxy_api_websocket_auth.py tests/integration/test_http_responses_bridge.py::test_v1_responses_http_bridge_rebinds_after_upstream_previous_response_not_found tests/integration/test_http_responses_bridge.py::test_v1_responses_http_bridge_masks_anonymous_previous_response_not_found_with_inflight_request -q (27 passed)
  • combined release tree: ruff check/format and git diff --check on touched backend/proxy test files
  • npx --yes @fission-ai/openspec validate --specs (19 passed)
  • Docker image build: codex-lb:release-diskpressure-46789ab
  • staging smoke on ports 11455/12455: stable /health, hourly archive write smoke, and simulated ENOSPC disk-pressure pause smoke
  • live smoke after deploy: stable /health, hourly archive write smoke, and simulated ENOSPC disk-pressure pause smoke
  • previous broader PR verification: .venv/bin/python -m ty check app/core/conversation_archive.py app/modules/conversation_archive tests/unit/test_conversation_archive.py; bun run test -- src/features/conversation-archive/schemas.test.ts src/features/dashboard/components/recent-requests-table.test.tsx; bun run typecheck; bun run lint -- src/features/conversation-archive src/features/dashboard/components/recent-requests-table.tsx (passes with pre-existing warnings in account-multi-select.tsx); bun run build

Related issues

@Komzpa Komzpa force-pushed the codex/conversation-archive-ui branch 7 times, most recently from 71482a7 to b997df7 Compare April 30, 2026 13:09
@Komzpa Komzpa marked this pull request as ready for review April 30, 2026 13:39
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b997df7201

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/core/clients/proxy.py
@@ -1853,7 +1876,20 @@ async def _stream_via_http(
raise ProxyResponseError(resp.status, error_payload)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Archive error payloads before raising on HTTP stream failures

In _stream_via_http, the resp.status >= 400 + raise_for_status branch raises ProxyResponseError immediately after parsing error_payload, so no archive record is written for the upstream failure body. Because the main proxy streaming path calls core_stream_responses(..., raise_for_status=True), common upstream 4xx/5xx failures end up with only the outbound request archived, which defeats request-level failure inspection in the new archive viewer.

Useful? React with 👍 / 👎.

Comment thread app/core/conversation_archive.py Outdated

def _archive_path() -> Path:
settings = get_settings()
directory = Path(getattr(settings, "conversation_archive_dir"))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expand user-home prefix for conversation archive directory

The archive path uses conversation_archive_dir directly without expanduser(). If operators follow the provided env example (~/.codex-lb/conversation-archive), Python treats ~ literally here, so archives are written under a relative ~/... directory instead of the user’s home. That can silently store sensitive archive files in an unexpected location and break retrieval assumptions.

Useful? React with 👍 / 👎.

@Komzpa Komzpa force-pushed the codex/conversation-archive-ui branch from 69c84a1 to ca6e96c Compare May 3, 2026 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant