Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
45a7c47
Initial package scaffold
khvn26 May 7, 2026
476e589
Move ruff and mypy to pre-commit, add Makefile
khvn26 May 7, 2026
f2becc5
Use prek instead of pre-commit
khvn26 May 7, 2026
405b911
Drop sibling-engines line from package description
khvn26 May 7, 2026
df58647
Rename tests/test_engine_parity.py to tests/test_engine.py
khvn26 May 7, 2026
4642532
Bump actions/checkout to v5
khvn26 May 7, 2026
edd18a4
Switch Snowflake schema to single-table with traits VARIANT
khvn26 May 7, 2026
7758f07
README: add measured VARIANT vs typed perf at 870M
khvn26 May 7, 2026
66a9698
CI: collapse parity + unit tests into a single make test run
khvn26 May 7, 2026
154de45
Batch parity fixtures: 1 INSERT + 1 SELECT for the whole 510-pair suite
khvn26 May 7, 2026
79951a6
Validate numeric segment values + add flagsmith-lint-tests hook
khvn26 May 7, 2026
356b65b
Extract Sanitiser: single seam for value-derived strings → SQL
khvn26 May 7, 2026
4c292c7
Refactor: utils module + dialect-owned trait path syntax
khvn26 May 7, 2026
8538fcb
Decouple TranslateContext from per-column kwargs and EnvironmentContext
khvn26 May 7, 2026
0ddc544
Declare jsonpath-rfc9535 as a direct dependency
khvn26 May 7, 2026
1664ae1
Type engine-test-data cases as EngineTestCase TypedDict
khvn26 May 7, 2026
4e6a9f1
Parity for the .jsonc test cases (96 new cases)
khvn26 May 7, 2026
ee468b4
Compile every engine-test-data case to SQL — no skips
khvn26 May 7, 2026
2cedc52
Trait EQUAL/IN: fast string compare + typed fallback (~6.3s → ~3.65s)
khvn26 May 7, 2026
4d46259
Enforce 100% line + branch coverage in CI
khvn26 May 7, 2026
49c62b3
Ignore coverage artefacts (.coverage, coverage.xml, htmlcov)
khvn26 May 7, 2026
15912d2
Tighten the None contract: emit FALSE where engine deterministically …
khvn26 May 8, 2026
dd02fdc
Route $.identity.traits.<X> to row trait path, not eval-context stati…
khvn26 May 8, 2026
cfedda4
Add Given/When/Then comments to identity-jsonpath unit tests
khvn26 May 8, 2026
1cf1be7
Move type-aware trait predicates to the dialect
khvn26 May 8, 2026
87bddbe
Tidy _classify_jsonpath: NamedTuple result, no None return
khvn26 May 8, 2026
a610d58
Skip _classify_jsonpath for bare trait keys
khvn26 May 8, 2026
3d5f523
$.identity (whole object): encode row-truth, don't trust eval ctx
khvn26 May 8, 2026
e4c4950
Trait-first dispatch in SQL: per-row trait shadowing wins over JSONPath
khvn26 May 8, 2026
c4666ba
Drop trait-first dispatch wrapper; xfail trait-shadow case again
khvn26 May 8, 2026
5224930
Strip dead branches from test code
khvn26 May 8, 2026
168cf16
Use a TEMPORARY table for the parity scratch identities
khvn26 May 8, 2026
36fa5a8
Use escape_string for the SQL-standard half of the test data quoter
khvn26 May 8, 2026
19728ba
parity: pack pair_id with the unit separator instead of a colon
khvn26 May 8, 2026
97e767f
docs cleanup for tests
khvn26 May 8, 2026
3ae079e
engine-parity: Extract DialectTestHarness for multi-dialect support
khvn26 May 8, 2026
dac3d0c
tests: Type-check tests directory under mypy strict
khvn26 May 8, 2026
9d1b192
docs: Groom source docstrings, tighten jsonpath_expr typing
khvn26 May 8, 2026
d570f26
dialect: Push regex-flavour gating into the dialect
khvn26 May 8, 2026
56835bb
docs: Groom README
khvn26 May 8, 2026
2813165
groom README
khvn26 May 8, 2026
8a8ac54
chore: Drop redundant future-annotations imports
khvn26 May 8, 2026
9296d9b
ci: Suppress phantom partial branches on exhaustive `match`
khvn26 May 8, 2026
398aced
ci: Add Renovate config; pin engine-test-data to v3.7.0
khvn26 May 8, 2026
b4065c2
ci: Add CODEOWNERS
khvn26 May 8, 2026
6adc186
docs: Move Performance section from README to PR description
khvn26 May 8, 2026
ba16490
feat(ClickHouse): Add ClickHouse dialect
khvn26 May 12, 2026
fa31acb
ci(ClickHouse): Run engine-parity harness against a docker-compose Cl…
khvn26 May 12, 2026
08f1d23
refactor(ClickHouse): Pivot dialect to native JSON type
khvn26 May 13, 2026
f6a5cca
perf(ClickHouse): Lead trait_eq / trait_in with toString fast path
khvn26 May 12, 2026
66c8b08
docs(ClickHouse): Polish README for the JSON-native dialect
khvn26 May 13, 2026
f89c656
chore(release): Bump version to 0.1.0a2 for CodeArtifact publish
khvn26 May 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: CI

on:
pull_request:
push:
branches: [ main ]

jobs:
ci:
name: CI
runs-on: ubuntu-latest
env:
SNOWFLAKE_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
SNOWFLAKE_USER: ${{ secrets.SNOWFLAKE_USER }}
SNOWFLAKE_ROLE: ${{ secrets.SNOWFLAKE_ROLE }}
SNOWFLAKE_WAREHOUSE: ${{ secrets.SNOWFLAKE_WAREHOUSE }}
SNOWFLAKE_DATABASE: ${{ secrets.SNOWFLAKE_DATABASE }}
SNOWFLAKE_SCHEMA: ${{ secrets.SNOWFLAKE_SCHEMA }}
SNOWFLAKE_PRIVATE_KEY_PATH: /tmp/snowflake_pk.p8
CLICKHOUSE_HOST: localhost
CLICKHOUSE_PORT: "8123"
steps:
- uses: actions/checkout@v5
with:
submodules: recursive
- uses: astral-sh/setup-uv@v7
with:
enable-cache: true
- run: make lint
- run: make typecheck
- name: Write Snowflake key file
run: |
umask 077
printf '%s' "${{ secrets.SNOWFLAKE_PRIVATE_KEY }}" > /tmp/snowflake_pk.p8
- name: Start ClickHouse
run: docker compose up --detach --wait clickhouse
- run: make test
- name: Check Coverage
uses: 5monkeys/cobertura-action@v14
with:
minimum_coverage: 100
fail_below_threshold: true
show_missing: true
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,11 @@ wheels/
.pytest_cache/
.mypy_cache/
.ruff_cache/

# Coverage
.coverage
coverage.xml
htmlcov/

# Local secrets
.env
4 changes: 4 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[submodule "engine-test-data"]
path = engine-test-data
url = https://github.com/Flagsmith/engine-test-data.git
branch = v3.7.0
30 changes: 30 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.6
hooks:
- id: ruff-check
args: [--fix]
- id: ruff-format
- repo: https://github.com/astral-sh/uv-pre-commit
rev: 0.10.10
hooks:
- id: uv-lock
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v6.0.0
hooks:
- id: check-yaml
- id: check-json
- id: check-toml
- repo: https://github.com/Flagsmith/flagsmith-common
rev: v3.8.2
hooks:
- id: flagsmith-lint-tests
- repo: local
hooks:
- id: python-typecheck
name: python-typecheck
language: system
entry: make typecheck
require_serial: true
pass_filenames: false
types: [python]
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.10
1 change: 1 addition & 0 deletions CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* @flagsmith/flagsmith-back-end
29 changes: 29 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
.PHONY: install-packages
install-packages: ## Install all required packages
uv sync

.PHONY: install-pre-commit
install-pre-commit: ## Install pre-commit hooks
uv run prek install

.PHONY: install
install: install-packages install-pre-commit ## Ensure the environment is set up

.PHONY: lint
lint: ## Run linters (pre-commit hooks across the tree)
uv run prek run --all-files

.PHONY: test
test: ## Run unit tests. Override scope with opts, e.g. `make test opts='-m engine_parity'`
uv run pytest $(opts)

.PHONY: typecheck
typecheck: ## Run mypy
uv run mypy

.PHONY: help
help:
@echo "Usage: make [target]"
@echo ""
@echo "Available targets:"
@awk 'BEGIN {FS = ":.*?## "} /^[a-zA-Z_-]+:.*?## / {printf " \033[36m%-30s\033[0m %s\n", $$1, $$2}' $(MAKEFILE_LIST)
196 changes: 195 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,197 @@
# flagsmith-sql-flag-engine

Placeholder. The initial package scaffold lands via the first pull request.
SQL translator for Flagsmith segment predicates.

Where the Python and Rust `flag_engine` implementations evaluate
`is_context_in_segment` against an in-memory `EvaluationContext`, this
package takes a `SegmentContext` and emits a SQL `WHERE` expression that
evaluates the segment against an entire `IDENTITIES` table — one row per
identity, with the identity's full trait map held in a single column
the translator path-extracts at query time. `PERCENTAGE_SPLIT` and
`:semver`-marked comparators compile to inline pure-SQL.

## Quickstart

```python
from flag_engine.context.types import EvaluationContext, SegmentContext

from flagsmith_sql_flag_engine import TranslateContext, translate_segment
from flagsmith_sql_flag_engine.dialects import SnowflakeDialect

eval_context: EvaluationContext = {
"environment": {"key": "n9fbf9...3ngWhb", "name": "Production"},
}
ctx = TranslateContext(evaluation_context=eval_context, dialect=SnowflakeDialect())

segment: SegmentContext = {
"key": "growth-cohort",
"name": "Growth cohort",
"rules": [
{
"type": "ALL",
"conditions": [
{"operator": "EQUAL", "property": "plan", "value": "growth"},
],
},
],
}
where_expr = translate_segment(segment, ctx)
# where_expr is a SQL string. Drop into:
# SELECT COUNT(*) FROM IDENTITIES i
# WHERE i.environment_id = 'n9fbf9...3ngWhb' AND ({where_expr})
```

`environment_id` in the `IDENTITIES` table is a string column holding
`EnvironmentContext.key` directly — the same identifier the engine uses,
no separate integer PK.

`translate_segment` returns `None` if the segment uses an operator the
translator can't handle — typically a REGEX pattern the active dialect's
regex flavour can't compile. Callers should fall back to
`flag_engine.is_context_in_segment` for those segments.

## Schema

Each dialect publishes the table layout it expects via a `schema_ddl`
constant. For Snowflake:

```sql
CREATE TABLE IF NOT EXISTS IDENTITIES (
environment_id STRING NOT NULL,
id NUMBER NOT NULL,
identifier STRING NOT NULL,
identity_key STRING NOT NULL,
traits VARIANT,
PRIMARY KEY (environment_id, id)
)
CLUSTER BY (environment_id, id);
```

For ClickHouse:

```sql
CREATE TABLE IF NOT EXISTS IDENTITIES (
environment_id String,
id UInt64,
identifier String,
identity_key String,
traits JSON
)
ENGINE = MergeTree()
ORDER BY (environment_id, id);
```

Both engines store traits in a single columnar-JSON column —
Snowflake's `VARIANT` and ClickHouse's `JSON` (24+, GA in 25.x). Each
key is stored as a typed subcolumn, so trait reads are direct columnar
scans rather than per-row JSON parses. Trait keys are *data* — new
keys appear without schema changes — and the translator only sees the
abstract path extraction.

ClickHouse Cloud requires `SET allow_experimental_json_type = 1` when
creating a `JSON`-column table (the type is GA on OSS 25.x); the
test harness applies this setting automatically.

Programmatic access:

```python
from flagsmith_sql_flag_engine.dialects.snowflake import SCHEMA_DDL as SNOWFLAKE_DDL
from flagsmith_sql_flag_engine.dialects.clickhouse import SCHEMA_DDL as CLICKHOUSE_DDL
```

## Engine parity

Validated against [Flagsmith/engine-test-data](https://github.com/Flagsmith/engine-test-data),
the test suite all engine implementations are tested against. The
engine-parity suite loads each test case's identity into a per-dialect
scratch table, translates the case's segments, runs the generated SQL,
and compares to `flag_engine.is_context_in_segment`.

To run the engine-parity suite locally:

```bash
git submodule update --init # pull engine-test-data

# Snowflake
export SNOWFLAKE_ACCOUNT=...
export SNOWFLAKE_USER=...
export SNOWFLAKE_PRIVATE_KEY_PATH=...

# ClickHouse — bring up the local container the CI workflow also uses
docker compose up --detach --wait clickhouse

uv run pytest tests/test_engine.py
```

Each harness's environment variables are only read at session-create
time; to run a single dialect's parity, pass e.g. `-k snowflake` or
`-k clickhouse` and only export that dialect's credentials.

Adding a new dialect's parity coverage is one harness module — see
`tests/harnesses/` for the shape.

## Dialects

The translator is dialect-aware: a `Dialect` protocol abstracts the
SQL fragments that differ across SQL engines — MD5 hex, hex-to-int
parsing, prefix-anchored regex, padded-version comparison, type-aware
trait predicates, regex flavour. Today `SnowflakeDialect` and
`ClickHouseDialect` are implemented; adding another engine such as
DuckDB or Postgres means writing one class.

### Snowflake vs ClickHouse

Both dialects pass the engine-parity suite with the same two xfails
(prerelease semver sort and `$.`-prefixed trait names — translator-level
divergences shared by every dialect). Operator coverage is identical.

The shape of the two implementations differs because the engines do:

| Concern | Snowflake | ClickHouse |
| -------------------- | ---------------------------------------- | ----------------------------------------------------------- |
| Trait storage | `VARIANT` (columnar JSON) | `JSON` (CH 24+, columnar JSON with typed subcolumns) |
| Trait path | `i.traits:"key"` returns VARIANT | ``i.traits.`key` `` returns Dynamic |
| Type discrimination | `TYPEOF`, `IS_BOOLEAN`, `IS_DECIMAL` | `toString(<sub>)` canonical-form fast path + typed-variant subcolumns (``.:String``, ``.:Bool``, ``.:Float64``) for type-strict dispatch |
| Hex chunk parse | `TO_NUMBER(SUBSTR(hex, n, 8), 'XXXXXXXX')` | `reinterpretAsUInt32(reverse(substring(MD5(...), n, 4)))` over raw bytes; no hex round-trip |
| Anchored regex | `REGEXP_INSTR(value, pat) = 1` | `match(value, '^(pat)')` — `match` is unanchored |
| Nullable propagation | `(VARIANT NULL)::STRING → NULL` | Subcolumn returns Dynamic NULL; `IS NULL` short-circuits |

Practical implications for callers:

- ClickHouse's `match`, `extractAll` reject `Nullable(String)` input
because the implied result types are unrepresentable. The dialect
wraps regex value expressions in `ifNull(..., '')`; trait paths are
always guarded by `IS NOT NULL` upstream, so the default is
unreachable at runtime.
- Snowflake's VARIANT path collapses both missing keys and JSON null to
SQL NULL "by accident" (cast propagation). The ClickHouse dialect's
subcolumn access does the same explicitly via the leading
`IS NULL` guard in `trait_path`. Caller-visible behaviour is the same.
- ClickHouse's batched `EXISTS`-equivalent uses windowed `count() > 0`
inside a `UNION ALL` — `EXISTS (SELECT 1 FROM ...)` isn't a top-level
expression in ClickHouse the way it is in Snowflake.

## Operator coverage

| Operator | Translatable | Notes |
| -------------------------------------------- | :----------: | -------------------------------------------------------------- |
| `EQUAL`, `NOT_EQUAL`, `IN` | yes | |
| `IS_SET`, `IS_NOT_SET` | yes | `traits:"<key>" IS NOT NULL` / `IS NULL` |
| `CONTAINS`, `NOT_CONTAINS` | yes | |
| `GREATER_THAN`, `LESS_THAN` plus `_INCLUSIVE`| yes | |
| `MODULO` | yes | |
| `PERCENTAGE_SPLIT` | yes | inlined MD5-mod-9999; ~0.005% diverge on hash==9998 |
| `REGEX` | partial | dialect-flavour gated; unsupported patterns → caller fallback |
| `:semver`-marked comparators | yes | major.minor.patch only; ignores prerelease |

## Development

```bash
make install # uv sync + pre-commit install
make lint # run pre-commit hooks across the tree
make typecheck # mypy
make test # unit tests
```

Ruff (lint + format) runs as a pre-commit hook on every commit. Mypy
runs as a `make typecheck` hook on staged Python files.
19 changes: 19 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
services:
clickhouse:
image: clickhouse/clickhouse-server:25.5.6
environment:
# Skip the random-password bootstrap. The container is only ever
# reachable from the harness on the same compose network / host
# loopback, so the default `default` user with no password is fine.
CLICKHOUSE_SKIP_USER_SETUP: "1"
ports:
- "8123:8123"
ulimits:
nofile:
soft: 262144
hard: 262144
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:8123/ping"]
interval: 2s
timeout: 2s
retries: 15
1 change: 1 addition & 0 deletions engine-test-data
Submodule engine-test-data added at 4b29dc
Loading
Loading