Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 24 additions & 7 deletions apps/worker/services/test_analytics/ta_process_flakes.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,17 +42,21 @@ def get_testruns(upload: ReportSession) -> QuerySet[Testrun]:
).order_by("timestamp")


def handle_pass(curr_flakes: dict[bytes, Flake], test_id: bytes):
def handle_pass(
curr_flakes: dict[bytes, Flake], test_id: bytes
) -> Flake | None:
# possible that we expire it and stop caring about it
if test_id not in curr_flakes:
return
return None

curr_flakes[test_id].recent_passes_count += 1
curr_flakes[test_id].count += 1
if curr_flakes[test_id].recent_passes_count == 30:
curr_flakes[test_id].end_date = timezone.now()
curr_flakes[test_id].save()
del curr_flakes[test_id]
expired_flake = curr_flakes.pop(test_id)
return expired_flake

return None


def handle_failure(
Expand Down Expand Up @@ -82,8 +86,9 @@ def handle_failure(
@sentry_sdk.trace
def process_single_upload(
upload: ReportSession, curr_flakes: dict[bytes, Flake], repo_id: int
):
) -> list[Flake]:
testruns = get_testruns(upload)
expired_flakes: list[Flake] = []

for testrun in testruns:
test_id = bytes(testrun.test_id)
Expand All @@ -92,13 +97,16 @@ def process_single_upload(
if test_id not in curr_flakes:
continue

handle_pass(curr_flakes, test_id)
expired_flake = handle_pass(curr_flakes, test_id)
if expired_flake is not None:
expired_flakes.append(expired_flake)
case "failure" | "flaky_fail" | "error":
handle_failure(curr_flakes, test_id, testrun, repo_id)
case _:
continue

Testrun.objects.bulk_update(testruns, ["outcome"])
return expired_flakes


@sentry_sdk.trace
Expand All @@ -120,8 +128,11 @@ def process_flakes_for_commit(repo_id: int, commit_id: str):
extra={"flakes": [flake.test_id.hex() for flake in curr_flakes.values()]},
)

all_expired_flakes: list[Flake] = []

for upload in uploads:
process_single_upload(upload, curr_flakes, repo_id)
expired_flakes = process_single_upload(upload, curr_flakes, repo_id)
all_expired_flakes.extend(expired_flakes)
log.info(
"process_flakes_for_commit: processed upload",
extra={"upload": upload.id},
Expand All @@ -139,6 +150,12 @@ def process_flakes_for_commit(repo_id: int, commit_id: str):
update_fields=["end_date", "count", "recent_passes_count", "fail_count"],
)

if all_expired_flakes:
Flake.objects.bulk_update(
all_expired_flakes,
["end_date", "count", "recent_passes_count"],
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expired flakes bulk_update missing fail_count field

High Severity

The bulk_update for expired flakes uses fields ["end_date", "count", "recent_passes_count"] but omits fail_count, which the bulk_create for non-expired flakes correctly includes. If handle_failure increments fail_count on a flake that later expires via handle_pass (e.g., across multiple uploads in the same commit), that fail_count change is silently lost. The original flake.save() persisted all fields.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 13cdd51. Configure here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newly created flake expiring crashes bulk_update

Medium Severity

If a new Flake is created in handle_failure (pk=None) and later expires via handle_pass within the same commit processing, it ends up in all_expired_flakes without a primary key. Django's bulk_update raises a ValueError ("All bulk_update() objects must have a primary key set") for objects with pk=None, crashing the entire processing run. The original flake.save() handled this correctly by performing an INSERT.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 13cdd51. Configure here.

Comment on lines +154 to +157
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The bulk_update for expired flakes omits the fail_count field, causing failure count increments to be lost if a flake expires in the same processing run.
Severity: HIGH

Suggested Fix

Add the fail_count field to the list of fields being updated in the Flake.objects.bulk_update call for expired flakes. The updated list should be ["end_date", "count", "recent_passes_count", "fail_count"].

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.

Location: apps/worker/services/test_analytics/ta_process_flakes.py#L154-L157

Potential issue: When a test flake experiences a failure and then subsequently expires
(due to 30 consecutive passes) within the same `process_flakes_for_commit` execution,
the incremented `fail_count` is not persisted to the database. This is because the
`bulk_update` operation for expired flakes does not include `fail_count` in its list of
fields to update. This results in data loss, leading to inaccurate flake metrics for
flakes that have both failures and then expire in the same processing window.

Did we get this right? 👍 / 👎 to inform future reviews.



@sentry_sdk.trace
def process_flakes_for_repo(repo_id: int):
Expand Down
Loading