Skip to content

fix(proxy): narrow budget-safe gate to primary usage#561

Merged
Soju06 merged 5 commits into
Soju06:mainfrom
Komzpa:fix/primary-budget-threshold
May 10, 2026
Merged

fix(proxy): narrow budget-safe gate to primary usage#561
Soju06 merged 5 commits into
Soju06:mainfrom
Komzpa:fix/primary-budget-threshold

Conversation

@Komzpa
Copy link
Copy Markdown
Contributor

@Komzpa Komzpa commented May 6, 2026

Summary

  • narrows the budget-safe Responses routing hard gate to primary-window usage only
  • keeps secondary/weekly usage as a routing strategy signal instead of a hard exclusion signal
  • makes the degraded all-primary-pressured usage_weighted fallback prefer lower primary pressure before lower weekly usage
  • adds OpenSpec coverage and regression tests for the [follow-up] Narrow _state_above_budget_threshold primary-only gating #446 edge cases

Closes #446.

Verification

  • openspec validate narrow-budget-safe-primary-gate --strict
  • UV_PROJECT_ENVIRONMENT=/home/kom/.venvs/codex-lb-fix-446 uv run pytest tests/unit/test_proxy_load_balancer_refresh.py -k 'budget_safe or reads_cached_usage_once' -q\n- UV_PROJECT_ENVIRONMENT=/home/kom/.venvs/codex-lb-fix-446 uv run ruff check app/modules/proxy/load_balancer.py tests/unit/test_proxy_load_balancer_refresh.py\n- UV_PROJECT_ENVIRONMENT=/home/kom/.venvs/codex-lb-fix-446 uv run ruff format --check app/modules/proxy/load_balancer.py tests/unit/test_proxy_load_balancer_refresh.py\n- UV_PROJECT_ENVIRONMENT=/home/kom/.venvs/codex-lb-fix-446 uv run ty check app/modules/proxy/load_balancer.py tests/unit/test_proxy_load_balancer_refresh.py

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 963e61a60f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/proxy/load_balancer.py Outdated
and state_list
and all(_state_above_budget_threshold(state, budget_threshold_pct) for state in state_list)
):
return SelectionResult(min(state_list, key=_primary_budget_pressure_sort_key), None)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Route all-pressured fallback through availability checks

When every candidate is above the primary threshold, this branch returns min(state_list, ...) directly instead of delegating to select_account. That bypasses the availability filtering in app/core/balancer/logic.py (RATE_LIMITED/QUOTA_EXCEEDED reset checks, cooldown suppression, and error-backoff handling), so under usage_weighted the balancer can pick an account that is currently unavailable just because it has lower primary usage. This regression appears whenever all primary usages are > threshold and at least one candidate is blocked/unrecoverable at selection time.

Useful? React with 👍 / 👎.

@Komzpa Komzpa force-pushed the fix/primary-budget-threshold branch from 963e61a to b10cabe Compare May 6, 2026 21:32
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented May 9, 2026

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b10cabe0f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/modules/proxy/load_balancer.py Outdated
Comment on lines +1342 to +1344
for state in sorted(state_list, key=_primary_budget_pressure_sort_key):
candidate = select_account(
[state],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve health-tier preference in all-pressured fallback

This loop evaluates each account in isolation, which bypasses select_account’s pool-level health-tier logic (healthy > probing > draining). In the usage_weighted + all-primary-pressured path, a draining account with slightly lower primary usage can now be selected before a healthy account, even though the normal selector intentionally prefers healthy capacity first. That is a behavioral regression in failover safety and can route traffic to degraded accounts when healthy ones are available.

Useful? React with 👍 / 👎.

@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented May 9, 2026

Thanks @Komzpa -- the #446 narrowing is exactly right. Codex review just landed two P1 findings on b10cabe that I think are real and worth addressing before merge:

P1 -- bypassing availability filtering

The new all-primary-pressured branch iterates sorted(state_list, key=_primary_budget_pressure_sort_key) and calls select_account([state], ...) per single-state list. select_account's availability filtering (RATE_LIMITED / QUOTA_EXCEEDED reset, cooldown suppression, error-backoff) does run on each single-state call, but the ordering of which states get tried first comes from the outer sort, not from the selector. So under usage_weighted with all candidates primary-pressured, an account that is currently unavailable can win the iteration just because its primary usage is the lowest.

P1 -- bypassing health-tier preference

Same shape: select_account([state]) cannot reason about healthy vs probing vs draining when there's only one state in the list, so the pool-level healthy > probing > draining preference that select_account would normally apply is silently lost. A draining account with slightly lower primary usage can now win over a healthy account in the all-pressured fallback.

Suggested shape

Keep the gate-narrowing change as-is (that's the right fix for #446), but rework the all-pressured fallback so it stays inside one select_account call:

  • Either: include health tier and availability flags in _primary_budget_pressure_sort_key (e.g. (health_tier_rank, availability_blocked, primary_used, secondary_used, last_selected, account_id)) and let min(states, key=...) produce the final ordering before calling select_account([sorted_state_list], ...) once.
  • Or, push the primary-first rule into the existing usage_weighted ranking inside select_account so it applies naturally instead of being a separate fallback path. That keeps health/availability/cooldown logic in one place.

The second option is cleaner architecturally, the first is less invasive. Either is fine.

Could you also add a regression test for each P1:

def test_all_primary_pressured_fallback_skips_unavailable_account()
def test_all_primary_pressured_fallback_prefers_healthy_over_draining()

Tests can stub AccountState.cooldown_until / error_backoff_until / status RATE_LIMITED for the unavailability case, and health_tier=draining vs healthy for the second.

The OpenSpec proposal copy is correct -- both scenarios (secondary-only pressure stays eligible and primary-first ordering in the fallback) hold under either fix shape, so no spec change required.

Ping me with the new commit and I'll re-run @codex review.

@chatgpt-codex-connector
Copy link
Copy Markdown

To use Codex here, create an environment for this repo.

@Komzpa Komzpa force-pushed the fix/primary-budget-threshold branch from 2b22cf6 to f3f12d9 Compare May 9, 2026 15:30
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented May 10, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@Soju06 Soju06 merged commit 3ed7834 into Soju06:main May 10, 2026
18 checks passed
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented May 10, 2026

@all-contributors please add @Komzpa for code, test

@allcontributors
Copy link
Copy Markdown
Contributor

@Soju06

@Komzpa already contributed before to code, test

xirothedev added a commit to xirothedev/codex-lb that referenced this pull request May 10, 2026
…l fetch timeouts, image rewrite)

Upstream batch v1.16.0 highlights:
- proxy: expose drain status for graceful deploys (Soju06#564)
- proxy: narrow budget-safe gate to primary usage (Soju06#561)
- proxy: handle model fetch timeouts (Soju06#541)
- proxy: inline-rewrite input_image file references (Soju06#571)
- proxy: retry transient stream timeouts (Soju06#551)
- proxy: use DEFAULT_HOME_DIR for oversized response.create dumps (Soju06#575)
- accounts: split compact quota row + recover quota status (Soju06#562, Soju06#559)
- db: size background pool for burst traffic (Soju06#563)
- upstream: drop top_p for gpt-5 (Soju06#538)

Conflict resolved:
- app/modules/proxy/service.py: kept fork's multi-line import + FailurePhase
  (used by deterministic failover at line 8294).

Merge cleanup:
- app/main.py: removed pre-existing duplicate InFlightMiddleware class +
  duplicate _is_benign_metrics_bind_failure helper that shadowed canonical
  versions and broke new upstream test (test_in_flight_middleware_does_not_count_drain_status).

Verification:
- pytest tests/unit: 1459 passed, 17 failed (all pre-existing on backup
  branch backup/pre-upstream-merge-20260511 — date-bound API key tests,
  test_db_migrate.py SQLite env, settings duplicate validators).
- openspec validate --specs: 19/19 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[follow-up] Narrow _state_above_budget_threshold primary-only gating

2 participants