Skip to content

feat(evaluations): auto-pop rule CRUD + auto-run form (chunk B)#3301

Draft
snopoke wants to merge 1 commit into
sk/auto-pop-eval-datasets-schemafrom
sk/auto-pop-eval-datasets-views
Draft

feat(evaluations): auto-pop rule CRUD + auto-run form (chunk B)#3301
snopoke wants to merge 1 commit into
sk/auto-pop-eval-datasets-schemafrom
sk/auto-pop-eval-datasets-views

Conversation

@snopoke
Copy link
Copy Markdown
Contributor

@snopoke snopoke commented May 6, 2026

Stacked PR: depends on #3299. Review-base is `sk/auto-pop-eval-datasets-schema` so this diff shows only chunk B's changes.

Product Description

When the new `flag_auto_populate_eval_datasets` flag is enabled for a team, an evaluation dataset gains an Auto-Population Rules section letting users configure rules that periodically append matching sessions/messages from a chosen source chatbot to that dataset. The evaluation config form gains an opt-in auto-run on dataset append checkbox.

No runtime ingestion happens yet — the periodic task and the auto-trigger plumbing land in chunks C and D respectively.

Technical Description

This is Chunk B of openspec change auto-populate-eval-datasets. It adds:

  • `DatasetAutoPopulationRuleForm` (apps/evaluations/forms.py)
    • Validates: source experiment team match, FilterParams parses cleanly.
    • `evaluation_mode` is inherited from the parent dataset (not user-editable) so the spec's mode-match invariant cannot be violated.
  • `apps/evaluations/views/auto_population_views.py` — `Create / Edit / Delete` CBVs scoped to `(team, dataset_id)`. `FlagGatedMixin` raises 404 unless `flag_auto_populate_eval_datasets` is active. Permissions: standard `evaluations.add/change/delete_datasetautopopulationrule` (auto-generated by Django for the model added in chunk A).
  • Nested URL patterns under `dataset/int:dataset_id/auto_population_rule/` for new, edit, delete.
  • `templates/evaluations/auto_population_rule_form.html` — extends `generic/object_form.html`, contextualises the form with the parent dataset's name and mode.
  • `templates/evaluations/auto_population_rules_section.html` — dataset-edit partial that lists rules with source chatbot, enabled badge, last-run timestamp, status badge, contributed-message count (via `Count("ingestion_entries")` annotation), and edit/delete actions. Inline error banner on `error` rules.
  • `templates/evaluations/dataset_edit.html` — includes the partial when `auto_population_flag_active` is true.
  • `EditDataset.get_context_data` (apps/evaluations/views/dataset_views.py) — annotates rules with contributed count and exposes `auto_population_flag_active`.
  • `EvaluationConfigForm` (apps/evaluations/forms.py) — adds `auto_run_on_append` to fields with help text describing the cost implications.
  • `templates/evaluations/evaluation_config_form.html` — renders the new checkbox.

Out of scope (chunks C–E): the periodic ingestion task that actually exercises rules, the dataset-append → delta-run trigger, the run-history UI for delta runs, integration tests, docs.

Validation:

  • `uv run python manage.py check` — clean.
  • `uv run ruff check apps/evaluations apps/teams/flags.py` — clean.
  • `uv run ty check apps/evaluations` — clean.
  • `uv run pytest apps/evaluations/tests/` — 194 passed, no regressions.

Migrations

  • The migrations are backwards compatible

(No new migrations in this PR — schema landed in chunk A.)

Demo

n/a yet — flag is not enabled by default and the views render UI that does nothing at runtime until chunk C ships the ingestion task. A combined demo will accompany chunks C/D when the feature performs work.

Docs and Changelog

  • This PR requires docs/changelog update

User-facing docs land alongside the chunk that activates the feature (C/D).

🤖 Generated with Claude Code

Chunk B of `auto-populate-eval-datasets`. Adds the user-facing forms,
views, URLs, and templates for managing auto-population rules on a
dataset. Stacks on chunk A; gated behind `flag_auto_populate_eval_datasets`
so default-team behaviour is unchanged.

- DatasetAutoPopulationRuleForm: validates source experiment team match,
  parses filter_query via FilterParams, inherits evaluation_mode from the
  parent dataset (mode field is not user-editable).
- CreateAutoPopulationRule / EditAutoPopulationRule / DeleteAutoPopulationRule
  CBVs in a new auto_population_views module, gated by FlagGatedMixin.
- Nested URL patterns dataset/<id>/auto_population_rule/{new,<pk>,<pk>/delete}.
- Dataset edit page renders an auto-population rules section (partial)
  when the flag is active, listing source chatbot, enabled state, last
  run, status, contributed-message count (annotated from
  DatasetIngestionEntry), with edit/delete actions.
- EvaluationConfigForm: auto_run_on_append checkbox surfaced in the
  config form template with cost-implications help text.

The ingestion task that exercises these rules and the dataset-append →
delta-run hook land in chunks C and D respectively. No runtime ingestion
yet.

Validation: ruff, ty, Django check, and `pytest apps/evaluations/tests/`
all clean (194 passed, no regressions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gates Failed
Enforce critical code health rules (1 file with Low Cohesion)

Gates Passed
2 Quality Gates Passed

See analysis details in CodeScene

Reason for failure
Enforce critical code health rules Violations Code Health Impact
forms.py 1 critical rule 5.56 → 4.96 Suppress

Quality Gate Profile: Clean Code Collective
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.

Comment thread apps/evaluations/forms.py
from apps.evaluations.exceptions import HistoryParseException
from apps.evaluations.models import (
ConditionType,
DatasetAutoPopulationRule,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ New issue: Low Cohesion
This module has at least 4 different responsibilities amongst its 51 functions, threshold = 4

Suppress

<button class="btn btn-xs btn-ghost text-error"
hx-delete="{% url 'evaluations:auto_population_rule_delete' request.team.slug object.id rule.id %}"
hx-confirm='{% translate "Delete this rule? Already-ingested messages stay in the dataset." %}'
hx-target="#rule-row-{{ rule.id }}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Deleting a rule with an error leaves the error message row orphaned in the UI because the HTMX target only removes the main rule row.
Severity: LOW

Suggested Fix

Wrap the rule's main row and its error row in a single container element, such as a <tbody>. Update the hx-target attribute on the delete button to target this new container, ensuring both rows are removed together upon deletion.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: templates/evaluations/auto_population_rules_section.html#L67

Potential issue: When an auto-population rule has an error, the UI renders two separate
table rows: one for the rule and one for the error message. The delete button's HTMX
configuration, `hx-target="#rule-row-{{ rule.id }}"`, only targets the main rule row.
Consequently, when a user deletes a rule that has an error, only the main row is removed
from the DOM. The error message row remains visible as an orphaned element in the table,
leading to a confusing user experience until the page is manually refreshed.

Also affects:

  • templates/evaluations/auto_population_rules_section.html:74~80

Did we get this right? 👍 / 👎 to inform future reviews.

Comment on lines +138 to +141
def delete(self, request, *args, **kwargs):
self.object = self.get_object()
self.object.delete()
return HttpResponse(status=200)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The DeleteAutoPopulationRule view is missing a success_url, which will cause an ImproperlyConfigured exception if the endpoint receives a POST request instead of the expected DELETE.
Severity: LOW

Suggested Fix

To make the view more robust, implement the get_success_url() method on the DeleteAutoPopulationRule view. This method should return the URL to the dataset edit page, ensuring the view can gracefully handle POST requests without crashing.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: apps/evaluations/views/auto_population_views.py#L138-L141

Potential issue: The `DeleteAutoPopulationRule` view inherits from Django's `DeleteView`
but does not define a `success_url` or override the `get_success_url()` method. While
the intended HTMX flow uses a `DELETE` request and returns an empty `HttpResponse`,
bypassing the need for a redirect, the view is vulnerable. If a standard `POST` request
is sent to this endpoint, Django's `DeleteView` logic will attempt to find a
`success_url` and fail, triggering an `ImproperlyConfigured` exception and causing a
server error.

Did we get this right? 👍 / 👎 to inform future reviews.

@codecov-commenter
Copy link
Copy Markdown

❌ 2 Tests Failed:

Tests completed Failed Passed Skipped
3014 2 3012 2
View the top 2 failed test(s) by shortest run time
apps/teams/tests/test_permissions.py::test_missing_content_types
Stack Traces | 0.002s run time
.../teams/tests/test_permissions.py:97: in test_missing_content_types
    assert not missing, f"Missing content types for {missing} in {app_label}"
E   AssertionError: Missing content types for {'datasetingestionentry', 'datasetautopopulationrule'} in evaluations
E   assert not {'datasetautopopulationrule', 'datasetingestionentry'}
apps/teams/tests/test_permission_references.py::test_permission_references
Stack Traces | 9.44s run time
.../teams/tests/test_permission_references.py:24: in test_permission_references
    assert model_name in CONTENT_TYPES[app_label], "Unknown model name in permission"
E   AssertionError: Unknown model name in permission
E   assert 'datasetautopopulationrule' in ['evaluationconfig', 'evaluationrun', 'evaluator', 'evaluationdataset', 'evaluationmessage', 'evaluationresult', ...]

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants