Skip to content

feat(Segment membership inspection PoC): Daily ClickHouse-backed per-env segment counts#7464

Open
khvn26 wants to merge 33 commits into
mainfrom
feat/segment-membership-counts
Open

feat(Segment membership inspection PoC): Daily ClickHouse-backed per-env segment counts#7464
khvn26 wants to merge 33 commits into
mainfrom
feat/segment-membership-counts

Conversation

@khvn26
Copy link
Copy Markdown
Member

@khvn26 khvn26 commented May 8, 2026

Thanks for submitting a PR! Please check the boxes below:

  • I have read the Contributing Guide.
  • I have added information to docs/ if required so people know about the feature.
  • I have filled in the "Changes" section below.
  • I have filled in the "How did you test this code" section below.

Changes

Contributes to #5663.

Adds a daily pipeline that backfills Dynamo identities into ClickHouse, materialises per-(segment, environment) match counts via flagsmith-sql-flag-engine (ClickHouseDialect), and exposes them on the segment endpoint as memberships: [{environment, count, last_synced_at}] for env-dropdown badges. Gated behind the org-scoped segment_membership_inspection FoF flag; no-ops when CLICKHOUSE_HOST is unset.

History note: the PoC originally landed against Snowflake. After the 12/05/2026 RFC review, the cost analysis (and a stacked engine PR adding the ClickHouse dialect) tipped us toward ClickHouse for V1.

Storage decisions:

  • The IDENTITIES table is ReplacingMergeTree(inserted_at) ORDER BY (environment_id, id). Daily INSERTs dedupe at merge time (most-recent inserted_at wins); refresh queries use FROM IDENTITIES AS i FINAL to force read-time dedup so counts always reflect the latest backfill.
  • Traits live in a JSON column (CH 24+; GA in 25.x). Each key becomes a typed columnar subcolumn — same cost shape as Snowflake's VARIANT.
  • The engine's published schema_ddl stays MergeTree as the simplest correct shape; this PoC overrides for its mutation-heavy workload. Translator output is engine-agnostic so the override is invisible to the SQL flag engine.

Spend attribution: clickhouse-connect opens a session with a log_comment setting per task (flagsmith:segment_membership:{backfill,refresh}:org_X:project_Y), landing in system.query_log for per-(org, project) rollups — direct analogue of Snowflake's QUERY_TAG.

Review complexity: 4/5

Review order recommendation:

  1. models.py (cache table)
  2. services.py (compile + count, parameterised SQL with FINAL)
  3. tasks.py (daily recurring backfill fans out per-project refresh)
  4. mappers.py (Dynamo doc → IDENTITIES row)
  5. migrations/0002_* (ClickHouse ReplacingMergeTree DDL via RunPython, no-op when unconfigured)
  6. segments/serializers.py + views.py (read-side memberships field, prefetched).

How did you test this code?

Beyond the existing unit + integration tests:

Ran an end-to-end smoke test against moto-mocked Dynamo + a real ClickHouse Cloud trial (eu-west-2) configured via env vars. The flow:

  1. Bootstrapped Django fixtures (Org, Project, Env, Segment with a plan IN (growth, enterprise) rule).
  2. Applied make docker-up django-migrate — migrations created the SegmentMembership table in core Postgres and the IDENTITIES schema in ClickHouse Cloud.
  3. Stood up an in-process Dynamo identities table via moto.mock_dynamodb and seeded 50 synthetic identities — 25 with traits.plan = "growth", 25 with traits.plan = "free".
  4. Invoked backfill_identities_to_clickhouse() directly — confirmed all 50 rows landed in ClickHouse's IDENTITIES table with the per-(org, project) log_comment attribution attached.
  5. Invoked refresh_project_segment_counts(project_id) (inline-dispatched from the backfill's fan-out) — confirmed exactly one SegmentMembership row materialised with count = 25 and a fresh last_synced_at.

Extensive testing will be done on staging.

@khvn26 khvn26 requested review from a team as code owners May 8, 2026 23:05
@khvn26 khvn26 requested review from gagantrivedi and removed request for a team May 8, 2026 23:05
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

3 Skipped Deployments
Project Deployment Actions Updated (UTC)
docs Ignored Ignored Preview May 20, 2026 3:05pm
flagsmith-frontend-preview Ignored Ignored Preview May 20, 2026 3:05pm
flagsmith-frontend-staging Ignored Ignored Preview May 20, 2026 3:05pm

Request Review

@github-actions github-actions Bot added api Issue related to the REST API docs Documentation updates feature New feature or request and removed docs Documentation updates labels May 8, 2026
Comment thread api/segment_membership/mappers.py Fixed
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Docker builds report

Image Build Status Security report
ghcr.io/flagsmith/flagsmith-e2e:pr-7464 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-e2e:pr-7464 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-api-test:pr-7464 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-api-test:pr-7464 Finished ✅ Skipped
ghcr.io/flagsmith/flagsmith-frontend:pr-7464 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-frontend:pr-7464 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-api:pr-7464 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-api:pr-7464 Finished ✅ Results
ghcr.io/flagsmith/flagsmith:pr-7464 Finished ✅ Results
ghcr.io/flagsmith/flagsmith:pr-7464 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-private-cloud:pr-7464 Finished ✅ Results
ghcr.io/flagsmith/flagsmith-private-cloud:pr-7464 Finished ✅ Results

@github-actions github-actions Bot added docs Documentation updates feature New feature or request and removed feature New feature or request docs Documentation updates labels May 8, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.49%. Comparing base (d97ef5d) to head (1a48ea9).
⚠️ Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7464      +/-   ##
==========================================
+ Coverage   98.39%   98.49%   +0.09%     
==========================================
  Files        1400     1415      +15     
  Lines       52892    53736     +844     
==========================================
+ Hits        52044    52926     +882     
+ Misses        848      810      -38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@khvn26 khvn26 requested a review from a team as a code owner May 9, 2026 00:05
@github-actions github-actions Bot added docs Documentation updates feature New feature or request and removed feature New feature or request docs Documentation updates labels May 9, 2026
@github-actions

This comment was marked as low quality.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 9, 2026

Visual Regression

19 screenshots compared. See report for details.
View full report

@github-actions github-actions Bot added docs Documentation updates feature New feature or request and removed feature New feature or request docs Documentation updates labels May 9, 2026
@khvn26 khvn26 force-pushed the feat/segment-membership-counts branch from c6e464b to bff85b4 Compare May 9, 2026 00:18
@github-actions github-actions Bot added the docs Documentation updates label May 9, 2026
khvn26 added 4 commits May 13, 2026 22:26
Mappers: drop private-helper tests, replace with parametrised cases
exercising `map_identity_document_to_snowflake_row` directly; trust
TypedDict-required fields rather than caring for absent ones.

Migration: assert the full DDL fed into `sess.sql(...)` and `spec`
the Snowpark mock against the real Session class.

beep boop
Drops snowpark + the original Snowflake-pinned `flagsmith-sql-flag-engine`
from the dependency tree; pins `flagsmith-sql-flag-engine = 0.1.0a2`
(includes the new `ClickHouseDialect` from the engine-side stacked PR);
adds `clickhouse-connect`.

Highlights:

- `is_clickhouse_configured()` short-circuits when `CLICKHOUSE_HOST`
  is unset. Other `CLICKHOUSE_*` settings have sensible defaults
  (port 8443, user `default`, secure HTTPS).
- `open_clickhouse_client(log_comment=...)` is the CH session-level
  analogue of Snowflake's `query_tag`. The comment lands in
  `system.query_log` for per-(org, project) spend rollups.
- Backfill: `ReplacingMergeTree(inserted_at)` table; daily INSERTs
  dedupe at merge time (most-recent `inserted_at` wins). Drops the
  per-environment DELETE the Snowpark path used.
- Refresh: `FROM IDENTITIES FINAL i` forces dedup at read time so
  counts always reflect the latest backfill snapshot, regardless of
  where the merge cycle has gotten to.
- The PoC's DDL is opinionated about ReplacingMergeTree + inserted_at
  version column; the engine's published `schema_ddl` stays
  `MergeTree` as the simplest correct shape, and the PoC overrides
  for its mutation-heavy workload. Translator output is engine-
  agnostic, so the override is invisible to the SQL flag engine.

uv dep wiring: `[[tool.uv.index]]` adds the staging CodeArtifact PyPI;
`[tool.uv.sources]` pins `flagsmith-sql-flag-engine` to that index.

beep boop
…e parse

E2E smoke caught two bugs in the ClickHouse migration:

- `_identity_id` was using `int.from_bytes(..., signed=True)`, which
  produces negative ints for half the UUID space. The CH schema has
  `id UInt64`, so negative IDs failed the bulk INSERT with `Unable to
  create Python array. ... trying to insert None values into a
  ClickHouse column that is not Nullable`. Switch to `signed=False`.
- `FROM IDENTITIES FINAL i` is invalid CH SQL — the alias must come
  via `AS` when `FINAL` is present. Use `FROM IDENTITIES AS i FINAL`.

beep boop
khvn26 and others added 9 commits May 15, 2026 20:08
…ip-counts

# Conflicts:
#	.github/actions/codeartifact-login/action.yml
#	.github/workflows/.reusable-docker-build.yml
#	.github/workflows/api-deploy-production-ecs.yml
#	.github/workflows/api-tests-with-private-packages.yml
#	api/pyproject.toml
#	api/uv.lock
Picks up the test-tools plugin fix for `Counter.clear()` on parameterless
metrics, which was breaking the segment-membership backfill tests.

beep boop
…ifact

flagsmith-sql-flag-engine 0.1.0a2 is published to the prod CodeArtifact
PyPI index. Add codeartifact-login to the workflows that install api
deps without it (api-pull-request, api-run-makefile-target,
update-flagsmith-environment) and to the OSS Docker build paths, since
they now resolve a private dep through CodeArtifact. The Dockerfile
build-python and api-test stages now consume the prod codeartifact
secret to fetch the same package.

beep boop
Mark every spot where workflow/Dockerfile/pyproject changes exist solely
because flagsmith-sql-flag-engine is on the prod CodeArtifact PyPI, so
they can be unwound once the package is open-sourced.

beep boop
Add a `CLICKHOUSE_URL` setting that mirrors the `DATABASE_URL` pattern —
one DSN env var instead of six discrete `CLICKHOUSE_*` fields. The
existing per-field settings remain as the fallback path so deployments
that prefer overrides keep working.

`clickhouse_connect.get_client` parses the DSN itself, but its per-field
kwargs take precedence over parsed values (`port = port or parsed.port`),
so when the URL is set the discrete CLICKHOUSE_* settings are not passed
through — the DSN drives every field.

beep boop
…etting

Replace the `is_clickhouse_configured()` service helper with a
declarative `settings.CLICKHOUSE_ENABLED`, computed once at settings load
from `CLICKHOUSE_URL or CLICKHOUSE_HOST`. Tasks and the migration now
short-circuit on that flag.

Drop the if/else around `clickhouse_connect.get_client` and pass both
`dsn=` and the discrete `CLICKHOUSE_*` kwargs unconditionally, letting
the library merge them via its `arg or parsed.<field>` resolution. The
discrete defaults move to None so an unset override doesn't accidentally
clobber the value parsed from the DSN.

beep boop
Collapse multi-paragraph docstrings down to the gist, drop mechanism
re-explanations that already live in one place, and shrink the
CLICKHOUSE settings block to match neighbouring style.

beep boop
Line references drifted after the docstring/comment trims. Regenerated
via `make generate-docs`.

beep boop
Copy link
Copy Markdown
Contributor

@emyller emyller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks thorough, thanks for making reviewing comfortable. I have a few performance-related questions, and some design concerns to cover.

Comment thread .github/workflows/api-pull-request.yml Outdated
Comment thread api/segment_membership/models.py Outdated
Comment thread api/segment_membership/migrations/0002_setup_clickhouse_identities_schema.py Outdated
Comment thread api/segment_membership/migrations/0002_setup_clickhouse_identities_schema.py Outdated
Comment thread api/segment_membership/mappers.py Outdated
Comment thread api/tests/integration/segments/test_segment_membership_field.py Outdated
Comment thread api/tests/integration/segments/test_segment_membership_field.py Outdated
Comment thread api/tests/integration/segments/test_segment_membership_field.py Outdated
Comment thread api/tests/unit/segment_membership/test_unit_segment_membership_mappers.py Outdated
Comment thread api/tests/unit/segment_membership/test_unit_segment_membership_services.py Outdated
khvn26 added 12 commits May 19, 2026 11:56
Point every CodeArtifact-related TODO at Flagsmith/flagsmith-sql-flag-engine#6 so the cleanup trigger is discoverable from the call site.

beep boop
The model caches per-(segment, environment) identity-match counts, but the
`SegmentMembership` name suggested a wrapper keyed by identity. Rename so
the entity is self-describing, and rename the `memberships` reverse
relation + API field to `membership_counts` to match.

beep boop
Swap clickhouse-connect for django-clickhouse-backend so the IDENTITIES
schema can be bootstrapped through manage.py migrate --database=clickhouse
rather than a RunPython migration gated by CLICKHOUSE_ENABLED. The
original sequencing trap (install without CH then enabling CH later
left IDENTITIES uncreated) goes away.

Connection settings now live in DATABASES["clickhouse"]. Backfill and
refresh use connections["clickhouse"].cursor() directly; bulk inserts
go through cursor.executemany, which clickhouse-driver collapses into a
single native Block. A new top-level clickhouse app holds the schema
migration; ClickHouseRouter fences it off the default Postgres alias
whether or not CH is configured.

beep boop
…tifier)

(environment_id, identifier) is the natural unique key in Flagsmith's
identity model, so ReplacingMergeTree can dedupe on it directly without a
synthetic id column. Drops the truncated-MD5 hash and the CodeQL finding
that went with it, eliminates collision risk at SaaS scale, and removes
the column entirely from the schema, mapper, and INSERT.

beep boop
… iteration

Project.id IN (subquery) doesn't care about duplicates in the IN list, so
the inner .distinct() on the project_ids subquery is wasted work. Drop it
and stream the projects via .iterator() so we don't materialise every
FoF-enabled SaaS project at once.

beep boop
…l completes

The previous shape collected every project's id during the backfill loop
and fanned out refresh tasks in one burst at the end of the daily run —
a queue-depth spike at SaaS scale. Move the dispatch inside the project
loop so refresh enqueues track the backfill cadence (one per project as
its cursor closes) and naturally spread across the backfill window.

beep boop
…kstop

"Runaway queries" overstated the concern — we don't have a known class of
queries to guard against. The actual reason for the 30m bound is to free
the task-processor slot when the CH client or dispatcher hangs.

beep boop
Standardise toward the convention used elsewhere in tests/integration:
inline f-string URL paths over Django's reverse(). Easier to scan when
reviewing a test and one less import.

beep boop
The Given/When/Then prose was restating what each line already does. Drop
it back to terse markers (combining # Given / When or # When / Then on
shorter tests), keeping only the comments that explain *why* — e.g. the
FINAL clause forcing ReplacingMergeTree dedup, log_comment landing in
query_log.

beep boop
A literal ISO timestamp on both sides of the assertion reads "dumber" than
a datetime computed then reformatted, which is the preferred shape in
tests.

beep boop
30m was a comfortable buffer, not a measured one. A single project refresh
is one UNION ALL aggregation; 10m is ~2x the legitimate ceiling and lets
us reclaim zombies inside one backfill cadence instead of letting them
linger. Widen on real data if this starts false-firing.

beep boop
emyller
emyller previously approved these changes May 19, 2026
Copy link
Copy Markdown
Contributor

@emyller emyller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all comments. This looks good to ship!

Bumps flagsmith-sql-flag-engine to the public 0.1.0 release and removes
the CodeArtifact source override plus every CI/Dockerfile workaround
that existed only because the package was private. CodeArtifact wiring
remains in place where it serves flagsmith-private.

beep boop
Copy link
Copy Markdown
Contributor

@emyller emyller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and public!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Issue related to the REST API feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants