review fixes: integration.yml gates + bare-except + env-var + drift-tolerant skip#166
Merged
Merged
Conversation
…ift-tolerant skip
Deep code review (cross-SDK audit prompted by user's question "what
about Python and Go?") caught several issues in the Python SDK that
the original QF-10 pass excluded because Python was tagged "already
ported, don't touch."
Workflow correctness (.github/workflows/integration.yml):
- Logs-after-teardown ordering bug. `Stop community stack`
(if: always) declared BEFORE `Show docker logs on failure`
(if: failure), so on a failed run compose down destroyed the
containers and subsequent compose logs returned empty. Same P0
shape we caught in TS+Java+Go. Reordered: capture logs first,
persist to disk, upload as artifact, then teardown.
- Added concurrency.group so back-to-back pushes don't spawn
parallel docker-compose stacks.
- Cron Mon → Tue 06:00 UTC (matches TS+Java+Go).
- `docker compose up -d --wait --wait-timeout 120` for native
dependency-aware health gating.
- Added actions/upload-artifact for docker-compose-logs.txt.
- Added pip cache to all three setup-python steps (was inconsistent
with ci.yml).
Examples (real correctness bug):
- examples/openai_integration.py — replaced two bare `except Exception:`
blocks with narrow handlers (`openai.OpenAIError` / `PolicyViolationError`).
The broad catches printed `(expected if no OpenAI key)` for every
error class, masking SDK regressions, schema drift, and governance
failures. Now adds proper imports for `openai` and
`axonflow.exceptions.PolicyViolationError`.
- examples/wcp_retry_idempotency.py — env-var name `AXONFLOW_BASE_URL`
→ `AXONFLOW_AGENT_URL`. The rest of the SDK and all other examples
use AGENT_URL; BASE_URL only worked because the script had its own
fallback default.
Tests (drift-tolerant skip):
- tests/test_integration.py test_generate_plan no longer string-matches
error text to decide whether to skip. The previous broad markers
("LLM", "provider", "Planning Engine", "not available", "not initialized")
silently absorbed any error containing those words — including
unrelated SDK regressions. Now gates on AXONFLOW_HAS_PLANNING=1
env var; runs the assertion fully when set, skips with an explanatory
message otherwise.
Verified: contract tests pass (21/21); openai_integration.py and
wcp_retry_idempotency.py compile clean; test_integration.py compiles clean.
CHANGELOG: 3 new Fixed entries under [Unreleased].
5 tasks
saurabhjain1592
added a commit
that referenced
this pull request
Apr 28, 2026
User feedback: "all means all, don't defer." Closing the loop on the 4 items deferred-with-reason from #166. P1 #9 — AXONFLOW_TRY=1 zero test coverage: Added tests/test_try_mode.py covering: - try-mode resolves to https://try.getaxonflow.com - try-mode overrides explicit endpoint - try-mode overrides AXONFLOW_AGENT_URL env var - try-mode requires client_id (TypeError with clear message) - try-mode requires client_id even with explicit endpoint - try-mode off → uses explicit endpoint - try-mode unset → falls back to AXONFLOW_AGENT_URL - AXONFLOW_TRY=other-value → treated as off (only "1" enables) 8 tests, all pass. P2 #11 — BLE001 (blind-except) ignored in examples/**: Removed BLE001 from the examples/**/*.py per-file-ignores list in pyproject.toml. Without the ignore, ruff would have caught the bare-except in openai_integration.py at PR time instead of needing a manual deep review. Verified ruff check examples/ passes after the narrower exception handlers from #166 land. P2 #13 — integration.yml only runs Python 3.11: contract-tests job now matrixed across 3.10/3.11/3.12, matching ci.yml. Catches version-specific drift at PR time. The other two jobs (demo-scripts, integration) stay single-version since they exercise the live stack and the multiplier would 3x runtime. P2 #12 — conftest.py only sets DO_NOT_TRACK: Defensive HTTP-egress block added to the autouse _disable_telemetry fixture. Patches httpx.get / httpx.post with a RuntimeError that tells the future test author to use httpx.MockTransport or a recorded fixture. Closes the leak path where a future test legitimately deletes DO_NOT_TRACK to exercise the telemetry path and accidentally fires real pings at the prod checkpoint. Verified: full suite still passes (948 tests, 29 skipped) with the HTTP block in place; ruff clean on examples/.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Deep code review (cross-SDK audit prompted by user's question "what about Python and Go?") caught several issues in the Python SDK that the original QF-10 pass excluded because Python was tagged "already ported, don't touch." Same audit lens that produced TS PRs #196/#197/#199 + Java PR #147 + the just-merged TS+Java+Go review-fixes PRs.
P0 — same shape as TS+Java+Go logs ordering
integration.ymllogs-after-teardown ordering. `Stop community stack` (`if: always`) declared BEFORE `Show docker logs on failure` (`if: failure`). On a failed run, `compose down --volumes --remove-orphans` destroyed the containers; subsequent `compose logs` returned empty. Fixed: capture logs first, persist to disk, upload as artifact, then teardown.P1 — workflow + correctness
integration.ymlno concurrency group. Back-to-back pushes spawn parallel docker-compose stacks fighting over ports 8080/8081. Fixed: addedconcurrency: integration-${{ github.ref }}withcancel-in-progress: true.integration.ymlcron Monday → Tuesday 06:00 UTC. Aligns with TS + Java + Go integration crons; failures land in EU/IN working hours.integration.ymlnoactions/upload-artifactfor logs. Even with the ordering fixed, when the runner is gone the logs are gone. Fixed: persistdocker compose logs --tail=500to disk before teardown, upload as artifact on failure.integration.ymldocker compose up -d --wait --wait-timeout 120for native dependency-aware health gating. Kept the existing curl /health loop as belt-and-suspenders.integration.ymlno pip cache. All threesetup-python@v5steps re-installed deps from cold every run.ci.ymlalready setcache: 'pip'; integration.yml didn't. Fixed: matched ci.yml's cache config across all three jobs.examples/openai_integration.pytwo bareexcept Exception:blocks. Both printed(expected if no OpenAI key)for every error class, masking SDK regressions, schema drift, and governance failures. Fixed: narrowed toopenai.OpenAIError/PolicyViolationError. Anything else now bubbles up.examples/wcp_retry_idempotency.pyenv-var name disagreement. UsedAXONFLOW_BASE_URL; the rest of the SDK and every other example useAXONFLOW_AGENT_URL. The script worked only because its own fallback default kicked in. Fixed: renamed toAXONFLOW_AGENT_URL.tests/test_integration.pytest_generate_planover-broad skip markers. Used string-matching on(\"LLM\", \"provider\", \"Planning Engine\", \"not available\", \"not initialized\")to decide whether to skip — silently absorbed any error containing those words, including unrelated SDK regressions. Fixed: gated on explicitAXONFLOW_HAS_PLANNING=1env var instead of pattern-matching error text.What I checked and did NOT flag
staging-eu, no fictional hostnames, no v1.17.0 module-path traps. Sandbox helper correctly defaults tolocalhost:8080.pyproject.tomlBLE001inexamples/**: flagged P2 by the review agent — would catch future bare excepts in examples. Deferred for a separate small PR; doesn't affect current correctness.Test plan
CHANGELOG
User-facing entries under `[Unreleased] / Fixed` for openai_integration narrowing, env-var rename, and test_integration drift-tolerant skip.