feat(evaluations): auto-pop rule CRUD + auto-run form (chunk B)#3301
feat(evaluations): auto-pop rule CRUD + auto-run form (chunk B)#3301snopoke wants to merge 1 commit into
Conversation
Chunk B of `auto-populate-eval-datasets`. Adds the user-facing forms,
views, URLs, and templates for managing auto-population rules on a
dataset. Stacks on chunk A; gated behind `flag_auto_populate_eval_datasets`
so default-team behaviour is unchanged.
- DatasetAutoPopulationRuleForm: validates source experiment team match,
parses filter_query via FilterParams, inherits evaluation_mode from the
parent dataset (mode field is not user-editable).
- CreateAutoPopulationRule / EditAutoPopulationRule / DeleteAutoPopulationRule
CBVs in a new auto_population_views module, gated by FlagGatedMixin.
- Nested URL patterns dataset/<id>/auto_population_rule/{new,<pk>,<pk>/delete}.
- Dataset edit page renders an auto-population rules section (partial)
when the flag is active, listing source chatbot, enabled state, last
run, status, contributed-message count (annotated from
DatasetIngestionEntry), with edit/delete actions.
- EvaluationConfigForm: auto_run_on_append checkbox surfaced in the
config form template with cost-implications help text.
The ingestion task that exercises these rules and the dataset-append →
delta-run hook land in chunks C and D respectively. No runtime ingestion
yet.
Validation: ruff, ty, Django check, and `pytest apps/evaluations/tests/`
all clean (194 passed, no regressions).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Gates Failed
Enforce critical code health rules
(1 file with Low Cohesion)
Gates Passed
2 Quality Gates Passed
See analysis details in CodeScene
Reason for failure
| Enforce critical code health rules | Violations | Code Health Impact | |
|---|---|---|---|
| forms.py | 1 critical rule | 5.56 → 4.96 | Suppress |
Quality Gate Profile: Clean Code Collective
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.
| from apps.evaluations.exceptions import HistoryParseException | ||
| from apps.evaluations.models import ( | ||
| ConditionType, | ||
| DatasetAutoPopulationRule, |
There was a problem hiding this comment.
❌ New issue: Low Cohesion
This module has at least 4 different responsibilities amongst its 51 functions, threshold = 4
| <button class="btn btn-xs btn-ghost text-error" | ||
| hx-delete="{% url 'evaluations:auto_population_rule_delete' request.team.slug object.id rule.id %}" | ||
| hx-confirm='{% translate "Delete this rule? Already-ingested messages stay in the dataset." %}' | ||
| hx-target="#rule-row-{{ rule.id }}" |
There was a problem hiding this comment.
Bug: Deleting a rule with an error leaves the error message row orphaned in the UI because the HTMX target only removes the main rule row.
Severity: LOW
Suggested Fix
Wrap the rule's main row and its error row in a single container element, such as a <tbody>. Update the hx-target attribute on the delete button to target this new container, ensuring both rows are removed together upon deletion.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: templates/evaluations/auto_population_rules_section.html#L67
Potential issue: When an auto-population rule has an error, the UI renders two separate
table rows: one for the rule and one for the error message. The delete button's HTMX
configuration, `hx-target="#rule-row-{{ rule.id }}"`, only targets the main rule row.
Consequently, when a user deletes a rule that has an error, only the main row is removed
from the DOM. The error message row remains visible as an orphaned element in the table,
leading to a confusing user experience until the page is manually refreshed.
Also affects:
templates/evaluations/auto_population_rules_section.html:74~80
Did we get this right? 👍 / 👎 to inform future reviews.
| def delete(self, request, *args, **kwargs): | ||
| self.object = self.get_object() | ||
| self.object.delete() | ||
| return HttpResponse(status=200) |
There was a problem hiding this comment.
Bug: The DeleteAutoPopulationRule view is missing a success_url, which will cause an ImproperlyConfigured exception if the endpoint receives a POST request instead of the expected DELETE.
Severity: LOW
Suggested Fix
To make the view more robust, implement the get_success_url() method on the DeleteAutoPopulationRule view. This method should return the URL to the dataset edit page, ensuring the view can gracefully handle POST requests without crashing.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: apps/evaluations/views/auto_population_views.py#L138-L141
Potential issue: The `DeleteAutoPopulationRule` view inherits from Django's `DeleteView`
but does not define a `success_url` or override the `get_success_url()` method. While
the intended HTMX flow uses a `DELETE` request and returns an empty `HttpResponse`,
bypassing the need for a redirect, the view is vulnerable. If a standard `POST` request
is sent to this endpoint, Django's `DeleteView` logic will attempt to find a
`success_url` and fail, triggering an `ImproperlyConfigured` exception and causing a
server error.
Did we get this right? 👍 / 👎 to inform future reviews.
❌ 2 Tests Failed:
View the top 2 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
Product Description
When the new `flag_auto_populate_eval_datasets` flag is enabled for a team, an evaluation dataset gains an Auto-Population Rules section letting users configure rules that periodically append matching sessions/messages from a chosen source chatbot to that dataset. The evaluation config form gains an opt-in auto-run on dataset append checkbox.
No runtime ingestion happens yet — the periodic task and the auto-trigger plumbing land in chunks C and D respectively.
Technical Description
This is Chunk B of openspec change auto-populate-eval-datasets. It adds:
Out of scope (chunks C–E): the periodic ingestion task that actually exercises rules, the dataset-append → delta-run trigger, the run-history UI for delta runs, integration tests, docs.
Validation:
Migrations
(No new migrations in this PR — schema landed in chunk A.)
Demo
n/a yet — flag is not enabled by default and the views render UI that does nothing at runtime until chunk C ships the ingestion task. A combined demo will accompany chunks C/D when the feature performs work.
Docs and Changelog
User-facing docs land alongside the chunk that activates the feature (C/D).
🤖 Generated with Claude Code