Skip to content

review fixes: integration.yml gates + bare-except + env-var + drift-tolerant skip#166

Merged
saurabhjain1592 merged 1 commit into
mainfrom
feature/qf-10-python-review-fixes
Apr 28, 2026
Merged

review fixes: integration.yml gates + bare-except + env-var + drift-tolerant skip#166
saurabhjain1592 merged 1 commit into
mainfrom
feature/qf-10-python-review-fixes

Conversation

@saurabhjain1592
Copy link
Copy Markdown
Member

Summary

Deep code review (cross-SDK audit prompted by user's question "what about Python and Go?") caught several issues in the Python SDK that the original QF-10 pass excluded because Python was tagged "already ported, don't touch." Same audit lens that produced TS PRs #196/#197/#199 + Java PR #147 + the just-merged TS+Java+Go review-fixes PRs.

P0 — same shape as TS+Java+Go logs ordering

  1. integration.yml logs-after-teardown ordering. `Stop community stack` (`if: always`) declared BEFORE `Show docker logs on failure` (`if: failure`). On a failed run, `compose down --volumes --remove-orphans` destroyed the containers; subsequent `compose logs` returned empty. Fixed: capture logs first, persist to disk, upload as artifact, then teardown.

P1 — workflow + correctness

  1. integration.yml no concurrency group. Back-to-back pushes spawn parallel docker-compose stacks fighting over ports 8080/8081. Fixed: added concurrency: integration-${{ github.ref }} with cancel-in-progress: true.

  2. integration.yml cron Monday → Tuesday 06:00 UTC. Aligns with TS + Java + Go integration crons; failures land in EU/IN working hours.

  3. integration.yml no actions/upload-artifact for logs. Even with the ordering fixed, when the runner is gone the logs are gone. Fixed: persist docker compose logs --tail=500 to disk before teardown, upload as artifact on failure.

  4. integration.yml docker compose up -d --wait --wait-timeout 120 for native dependency-aware health gating. Kept the existing curl /health loop as belt-and-suspenders.

  5. integration.yml no pip cache. All three setup-python@v5 steps re-installed deps from cold every run. ci.yml already set cache: 'pip'; integration.yml didn't. Fixed: matched ci.yml's cache config across all three jobs.

  6. examples/openai_integration.py two bare except Exception: blocks. Both printed (expected if no OpenAI key) for every error class, masking SDK regressions, schema drift, and governance failures. Fixed: narrowed to openai.OpenAIError / PolicyViolationError. Anything else now bubbles up.

  7. examples/wcp_retry_idempotency.py env-var name disagreement. Used AXONFLOW_BASE_URL; the rest of the SDK and every other example use AXONFLOW_AGENT_URL. The script worked only because its own fallback default kicked in. Fixed: renamed to AXONFLOW_AGENT_URL.

  8. tests/test_integration.py test_generate_plan over-broad skip markers. Used string-matching on (\"LLM\", \"provider\", \"Planning Engine\", \"not available\", \"not initialized\") to decide whether to skip — silently absorbed any error containing those words, including unrelated SDK regressions. Fixed: gated on explicit AXONFLOW_HAS_PLANNING=1 env var instead of pattern-matching error text.

What I checked and did NOT flag

  • Decommissioned URLs: Python is genuinely clean. No staging-eu, no fictional hostnames, no v1.17.0 module-path traps. Sandbox helper correctly defaults to localhost:8080.
  • pyproject.toml BLE001 in examples/**: flagged P2 by the review agent — would catch future bare excepts in examples. Deferred for a separate small PR; doesn't affect current correctness.
  • AXONFLOW_TRY=1 mode test coverage: flagged P1 by review agent — needs a new test file, not a one-line fix. Tracking separately.

Test plan

  • Contract tests: 21/21 pass (`pytest tests/test_contract.py`)
  • All touched Python files compile cleanly (`python -m py_compile`)
  • CI green on PR
  • CI: integration green on push to main (basic + gateway examples + new ordering + new artifact upload)

CHANGELOG

User-facing entries under `[Unreleased] / Fixed` for openai_integration narrowing, env-var rename, and test_integration drift-tolerant skip.

…ift-tolerant skip

Deep code review (cross-SDK audit prompted by user's question "what
about Python and Go?") caught several issues in the Python SDK that
the original QF-10 pass excluded because Python was tagged "already
ported, don't touch."

Workflow correctness (.github/workflows/integration.yml):

- Logs-after-teardown ordering bug. `Stop community stack`
  (if: always) declared BEFORE `Show docker logs on failure`
  (if: failure), so on a failed run compose down destroyed the
  containers and subsequent compose logs returned empty. Same P0
  shape we caught in TS+Java+Go. Reordered: capture logs first,
  persist to disk, upload as artifact, then teardown.
- Added concurrency.group so back-to-back pushes don't spawn
  parallel docker-compose stacks.
- Cron Mon → Tue 06:00 UTC (matches TS+Java+Go).
- `docker compose up -d --wait --wait-timeout 120` for native
  dependency-aware health gating.
- Added actions/upload-artifact for docker-compose-logs.txt.
- Added pip cache to all three setup-python steps (was inconsistent
  with ci.yml).

Examples (real correctness bug):

- examples/openai_integration.py — replaced two bare `except Exception:`
  blocks with narrow handlers (`openai.OpenAIError` / `PolicyViolationError`).
  The broad catches printed `(expected if no OpenAI key)` for every
  error class, masking SDK regressions, schema drift, and governance
  failures. Now adds proper imports for `openai` and
  `axonflow.exceptions.PolicyViolationError`.
- examples/wcp_retry_idempotency.py — env-var name `AXONFLOW_BASE_URL`
  → `AXONFLOW_AGENT_URL`. The rest of the SDK and all other examples
  use AGENT_URL; BASE_URL only worked because the script had its own
  fallback default.

Tests (drift-tolerant skip):

- tests/test_integration.py test_generate_plan no longer string-matches
  error text to decide whether to skip. The previous broad markers
  ("LLM", "provider", "Planning Engine", "not available", "not initialized")
  silently absorbed any error containing those words — including
  unrelated SDK regressions. Now gates on AXONFLOW_HAS_PLANNING=1
  env var; runs the assertion fully when set, skips with an explanatory
  message otherwise.

Verified: contract tests pass (21/21); openai_integration.py and
wcp_retry_idempotency.py compile clean; test_integration.py compiles clean.

CHANGELOG: 3 new Fixed entries under [Unreleased].
@saurabhjain1592 saurabhjain1592 merged commit d16f4a2 into main Apr 28, 2026
13 checks passed
@saurabhjain1592 saurabhjain1592 deleted the feature/qf-10-python-review-fixes branch April 28, 2026 17:12
saurabhjain1592 added a commit that referenced this pull request Apr 28, 2026
User feedback: "all means all, don't defer." Closing the loop on the
4 items deferred-with-reason from #166.

P1 #9 — AXONFLOW_TRY=1 zero test coverage:
  Added tests/test_try_mode.py covering:
   - try-mode resolves to https://try.getaxonflow.com
   - try-mode overrides explicit endpoint
   - try-mode overrides AXONFLOW_AGENT_URL env var
   - try-mode requires client_id (TypeError with clear message)
   - try-mode requires client_id even with explicit endpoint
   - try-mode off → uses explicit endpoint
   - try-mode unset → falls back to AXONFLOW_AGENT_URL
   - AXONFLOW_TRY=other-value → treated as off (only "1" enables)
  8 tests, all pass.

P2 #11 — BLE001 (blind-except) ignored in examples/**:
  Removed BLE001 from the examples/**/*.py per-file-ignores list in
  pyproject.toml. Without the ignore, ruff would have caught the
  bare-except in openai_integration.py at PR time instead of needing
  a manual deep review. Verified ruff check examples/ passes after the
  narrower exception handlers from #166 land.

P2 #13 — integration.yml only runs Python 3.11:
  contract-tests job now matrixed across 3.10/3.11/3.12, matching
  ci.yml. Catches version-specific drift at PR time. The other two
  jobs (demo-scripts, integration) stay single-version since they
  exercise the live stack and the multiplier would 3x runtime.

P2 #12 — conftest.py only sets DO_NOT_TRACK:
  Defensive HTTP-egress block added to the autouse _disable_telemetry
  fixture. Patches httpx.get / httpx.post with a RuntimeError that
  tells the future test author to use httpx.MockTransport or a
  recorded fixture. Closes the leak path where a future test
  legitimately deletes DO_NOT_TRACK to exercise the telemetry path
  and accidentally fires real pings at the prod checkpoint.

Verified: full suite still passes (948 tests, 29 skipped) with the
HTTP block in place; ruff clean on examples/.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant