feat(Segment membership inspection PoC): Daily ClickHouse-backed per-env segment counts#7464
feat(Segment membership inspection PoC): Daily ClickHouse-backed per-env segment counts#7464khvn26 wants to merge 33 commits into
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
The latest updates on your projects. Learn more about Vercel for GitHub. 3 Skipped Deployments
|
Docker builds report
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7464 +/- ##
==========================================
+ Coverage 98.39% 98.49% +0.09%
==========================================
Files 1400 1415 +15
Lines 52892 53736 +844
==========================================
+ Hits 52044 52926 +882
+ Misses 848 810 -38 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This comment was marked as low quality.
This comment was marked as low quality.
Visual Regression19 screenshots compared. See report for details. |
c6e464b to
bff85b4
Compare
This reverts commit 9d31cb3.
Mappers: drop private-helper tests, replace with parametrised cases exercising `map_identity_document_to_snowflake_row` directly; trust TypedDict-required fields rather than caring for absent ones. Migration: assert the full DDL fed into `sess.sql(...)` and `spec` the Snowpark mock against the real Session class. beep boop
Drops snowpark + the original Snowflake-pinned `flagsmith-sql-flag-engine` from the dependency tree; pins `flagsmith-sql-flag-engine = 0.1.0a2` (includes the new `ClickHouseDialect` from the engine-side stacked PR); adds `clickhouse-connect`. Highlights: - `is_clickhouse_configured()` short-circuits when `CLICKHOUSE_HOST` is unset. Other `CLICKHOUSE_*` settings have sensible defaults (port 8443, user `default`, secure HTTPS). - `open_clickhouse_client(log_comment=...)` is the CH session-level analogue of Snowflake's `query_tag`. The comment lands in `system.query_log` for per-(org, project) spend rollups. - Backfill: `ReplacingMergeTree(inserted_at)` table; daily INSERTs dedupe at merge time (most-recent `inserted_at` wins). Drops the per-environment DELETE the Snowpark path used. - Refresh: `FROM IDENTITIES FINAL i` forces dedup at read time so counts always reflect the latest backfill snapshot, regardless of where the merge cycle has gotten to. - The PoC's DDL is opinionated about ReplacingMergeTree + inserted_at version column; the engine's published `schema_ddl` stays `MergeTree` as the simplest correct shape, and the PoC overrides for its mutation-heavy workload. Translator output is engine- agnostic, so the override is invisible to the SQL flag engine. uv dep wiring: `[[tool.uv.index]]` adds the staging CodeArtifact PyPI; `[tool.uv.sources]` pins `flagsmith-sql-flag-engine` to that index. beep boop
…e parse E2E smoke caught two bugs in the ClickHouse migration: - `_identity_id` was using `int.from_bytes(..., signed=True)`, which produces negative ints for half the UUID space. The CH schema has `id UInt64`, so negative IDs failed the bulk INSERT with `Unable to create Python array. ... trying to insert None values into a ClickHouse column that is not Nullable`. Switch to `signed=False`. - `FROM IDENTITIES FINAL i` is invalid CH SQL — the alias must come via `AS` when `FINAL` is present. Use `FROM IDENTITIES AS i FINAL`. beep boop
…ip-counts # Conflicts: # .github/actions/codeartifact-login/action.yml # .github/workflows/.reusable-docker-build.yml # .github/workflows/api-deploy-production-ecs.yml # .github/workflows/api-tests-with-private-packages.yml # api/pyproject.toml # api/uv.lock
Picks up the test-tools plugin fix for `Counter.clear()` on parameterless metrics, which was breaking the segment-membership backfill tests. beep boop
…ifact flagsmith-sql-flag-engine 0.1.0a2 is published to the prod CodeArtifact PyPI index. Add codeartifact-login to the workflows that install api deps without it (api-pull-request, api-run-makefile-target, update-flagsmith-environment) and to the OSS Docker build paths, since they now resolve a private dep through CodeArtifact. The Dockerfile build-python and api-test stages now consume the prod codeartifact secret to fetch the same package. beep boop
Mark every spot where workflow/Dockerfile/pyproject changes exist solely because flagsmith-sql-flag-engine is on the prod CodeArtifact PyPI, so they can be unwound once the package is open-sourced. beep boop
Add a `CLICKHOUSE_URL` setting that mirrors the `DATABASE_URL` pattern — one DSN env var instead of six discrete `CLICKHOUSE_*` fields. The existing per-field settings remain as the fallback path so deployments that prefer overrides keep working. `clickhouse_connect.get_client` parses the DSN itself, but its per-field kwargs take precedence over parsed values (`port = port or parsed.port`), so when the URL is set the discrete CLICKHOUSE_* settings are not passed through — the DSN drives every field. beep boop
…etting Replace the `is_clickhouse_configured()` service helper with a declarative `settings.CLICKHOUSE_ENABLED`, computed once at settings load from `CLICKHOUSE_URL or CLICKHOUSE_HOST`. Tasks and the migration now short-circuit on that flag. Drop the if/else around `clickhouse_connect.get_client` and pass both `dsn=` and the discrete `CLICKHOUSE_*` kwargs unconditionally, letting the library merge them via its `arg or parsed.<field>` resolution. The discrete defaults move to None so an unset override doesn't accidentally clobber the value parsed from the DSN. beep boop
Collapse multi-paragraph docstrings down to the gist, drop mechanism re-explanations that already live in one place, and shrink the CLICKHOUSE settings block to match neighbouring style. beep boop
for more information, see https://pre-commit.ci
Line references drifted after the docstring/comment trims. Regenerated via `make generate-docs`. beep boop
emyller
left a comment
There was a problem hiding this comment.
Looks thorough, thanks for making reviewing comfortable. I have a few performance-related questions, and some design concerns to cover.
Point every CodeArtifact-related TODO at Flagsmith/flagsmith-sql-flag-engine#6 so the cleanup trigger is discoverable from the call site. beep boop
The model caches per-(segment, environment) identity-match counts, but the `SegmentMembership` name suggested a wrapper keyed by identity. Rename so the entity is self-describing, and rename the `memberships` reverse relation + API field to `membership_counts` to match. beep boop
Swap clickhouse-connect for django-clickhouse-backend so the IDENTITIES schema can be bootstrapped through manage.py migrate --database=clickhouse rather than a RunPython migration gated by CLICKHOUSE_ENABLED. The original sequencing trap (install without CH then enabling CH later left IDENTITIES uncreated) goes away. Connection settings now live in DATABASES["clickhouse"]. Backfill and refresh use connections["clickhouse"].cursor() directly; bulk inserts go through cursor.executemany, which clickhouse-driver collapses into a single native Block. A new top-level clickhouse app holds the schema migration; ClickHouseRouter fences it off the default Postgres alias whether or not CH is configured. beep boop
…tifier) (environment_id, identifier) is the natural unique key in Flagsmith's identity model, so ReplacingMergeTree can dedupe on it directly without a synthetic id column. Drops the truncated-MD5 hash and the CodeQL finding that went with it, eliminates collision risk at SaaS scale, and removes the column entirely from the schema, mapper, and INSERT. beep boop
… iteration Project.id IN (subquery) doesn't care about duplicates in the IN list, so the inner .distinct() on the project_ids subquery is wasted work. Drop it and stream the projects via .iterator() so we don't materialise every FoF-enabled SaaS project at once. beep boop
…l completes The previous shape collected every project's id during the backfill loop and fanned out refresh tasks in one burst at the end of the daily run — a queue-depth spike at SaaS scale. Move the dispatch inside the project loop so refresh enqueues track the backfill cadence (one per project as its cursor closes) and naturally spread across the backfill window. beep boop
…kstop "Runaway queries" overstated the concern — we don't have a known class of queries to guard against. The actual reason for the 30m bound is to free the task-processor slot when the CH client or dispatcher hangs. beep boop
Standardise toward the convention used elsewhere in tests/integration: inline f-string URL paths over Django's reverse(). Easier to scan when reviewing a test and one less import. beep boop
The Given/When/Then prose was restating what each line already does. Drop it back to terse markers (combining # Given / When or # When / Then on shorter tests), keeping only the comments that explain *why* — e.g. the FINAL clause forcing ReplacingMergeTree dedup, log_comment landing in query_log. beep boop
A literal ISO timestamp on both sides of the assertion reads "dumber" than a datetime computed then reformatted, which is the preferred shape in tests. beep boop
30m was a comfortable buffer, not a measured one. A single project refresh is one UNION ALL aggregation; 10m is ~2x the legitimate ceiling and lets us reclaim zombies inside one backfill cadence instead of letting them linger. Widen on real data if this starts false-firing. beep boop
emyller
left a comment
There was a problem hiding this comment.
Thanks for addressing all comments. This looks good to ship!
Bumps flagsmith-sql-flag-engine to the public 0.1.0 release and removes the CodeArtifact source override plus every CI/Dockerfile workaround that existed only because the package was private. CodeArtifact wiring remains in place where it serves flagsmith-private. beep boop
Thanks for submitting a PR! Please check the boxes below:
docs/if required so people know about the feature.Changes
Contributes to #5663.
Adds a daily pipeline that backfills Dynamo identities into ClickHouse, materialises per-(segment, environment) match counts via
flagsmith-sql-flag-engine(ClickHouseDialect), and exposes them on the segment endpoint asmemberships: [{environment, count, last_synced_at}]for env-dropdown badges. Gated behind the org-scopedsegment_membership_inspectionFoF flag; no-ops whenCLICKHOUSE_HOSTis unset.History note: the PoC originally landed against Snowflake. After the 12/05/2026 RFC review, the cost analysis (and a stacked engine PR adding the ClickHouse dialect) tipped us toward ClickHouse for V1.
Storage decisions:
ReplacingMergeTree(inserted_at) ORDER BY (environment_id, id). Daily INSERTs dedupe at merge time (most-recentinserted_atwins); refresh queries useFROM IDENTITIES AS i FINALto force read-time dedup so counts always reflect the latest backfill.JSONcolumn (CH 24+; GA in 25.x). Each key becomes a typed columnar subcolumn — same cost shape as Snowflake's VARIANT.schema_ddlstaysMergeTreeas the simplest correct shape; this PoC overrides for its mutation-heavy workload. Translator output is engine-agnostic so the override is invisible to the SQL flag engine.Spend attribution:
clickhouse-connectopens a session with alog_commentsetting per task (flagsmith:segment_membership:{backfill,refresh}:org_X:project_Y), landing insystem.query_logfor per-(org, project) rollups — direct analogue of Snowflake'sQUERY_TAG.Review complexity: 4/5
Review order recommendation:
models.py(cache table)services.py(compile + count, parameterised SQL withFINAL)tasks.py(daily recurring backfill fans out per-project refresh)mappers.py(Dynamo doc → IDENTITIES row)migrations/0002_*(ClickHouseReplacingMergeTreeDDL viaRunPython, no-op when unconfigured)segments/serializers.py+views.py(read-sidemembershipsfield, prefetched).How did you test this code?
Beyond the existing unit + integration tests:
Ran an end-to-end smoke test against
moto-mocked Dynamo + a real ClickHouse Cloud trial (eu-west-2) configured via env vars. The flow:plan IN (growth, enterprise)rule).make docker-up django-migrate— migrations created theSegmentMembershiptable in core Postgres and theIDENTITIESschema in ClickHouse Cloud.moto.mock_dynamodband seeded 50 synthetic identities — 25 withtraits.plan = "growth", 25 withtraits.plan = "free".backfill_identities_to_clickhouse()directly — confirmed all 50 rows landed in ClickHouse'sIDENTITIEStable with the per-(org, project)log_commentattribution attached.refresh_project_segment_counts(project_id)(inline-dispatched from the backfill's fan-out) — confirmed exactly oneSegmentMembershiprow materialised withcount = 25and a freshlast_synced_at.Extensive testing will be done on staging.