Skip to content

feat: add simple retries for status requests#65

Merged
pboros merged 1 commit intomainfrom
pboros/simple-retry-status
Jan 23, 2026
Merged

feat: add simple retries for status requests#65
pboros merged 1 commit intomainfrom
pboros/simple-retry-status

Conversation

@pboros
Copy link
Copy Markdown
Collaborator

@pboros pboros commented Jan 23, 2026

No description provided.

@pboros
Copy link
Copy Markdown
Collaborator Author

pboros commented Jan 23, 2026

@greptileai

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Jan 23, 2026

Greptile Summary

Added simple retry logic to status requests with a 1-second delay between attempts. The GetStatus() function was refactored to extract the core logic into getStatusOnce(), which is retried once on failure. The incomplete status check was also extracted into IsStatusIncomplete() for reusability.

  • Reduced metrics collection interval from 10s to 4s
  • Added retry logic with 1-second delay in GetStatus()
  • Changed behavior: incomplete status now returns error (when debug flag is set), triggering retry
  • Extracted IsStatusIncomplete() helper function for checking status validity

Confidence Score: 3/5

  • This PR is generally safe but has potential timing concerns that should be verified in production
  • The retry logic is straightforward and improves resilience, but the collection interval reduction from 10s to 4s combined with the 1s retry delay could cause overlapping collections if status requests are slow. The behavior change where incomplete status now returns an error (triggering retry) when the debug flag is set should be verified as intentional.
  • Pay attention to metrics/metric_provider.go for the collection interval change and how it interacts with the retry timing

Important Files Changed

Filename Overview
db/conn.go refactored GetStatus() to add simple retry logic with 1-second delay; extracted incomplete status check; made incomplete status return error when debug flag is set
metrics/metric_provider.go reduced collection interval from 10s to 4s; removed TODO comment

Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

func (mp *MetricProvider) Collect() {
// TODO make this configurable
interval := 10 * time.Second
interval := 4 * time.Second
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: collection interval reduced from 10s to 4s - combined with the 1s retry delay in GetStatus(), overlapping collection attempts could occur if status requests consistently take >4s

Suggested change
interval := 4 * time.Second
interval := 6 * time.Second
Prompt To Fix With AI
This is a comment left during a code review.
Path: metrics/metric_provider.go
Line: 58:58

Comment:
**style:** collection interval reduced from 10s to 4s - combined with the 1s retry delay in `GetStatus()`, overlapping collection attempts could occur if status requests consistently take >4s

```suggestion
	interval := 6 * time.Second
```

How can I resolve this? If you propose a fix, please make it concise.

Comment thread db/conn.go
Str("status_json", statusString).
Msg("status json is missing cluster fields")

return nil, fmt.Errorf("incomplete status")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: incomplete status now returns error when DEBUG_LOG_INCOMPLETE_STATUS is set, which triggers retry - verify this behavior change is intentional since previously it would return the incomplete status without error. was the previous behavior (returning incomplete status without error) a bug, or is this new retry behavior a breaking change?

Prompt To Fix With AI
This is a comment left during a code review.
Path: db/conn.go
Line: 146:146

Comment:
**style:** incomplete status now returns error when `DEBUG_LOG_INCOMPLETE_STATUS` is set, which triggers retry - verify this behavior change is intentional since previously it would return the incomplete status without error. was the previous behavior (returning incomplete status without error) a bug, or is this new retry behavior a breaking change?

How can I resolve this? If you propose a fix, please make it concise.

@rbarabas rbarabas requested a review from Copilot January 23, 2026 14:55
@pboros pboros merged commit 79b7540 into main Jan 23, 2026
2 checks passed
@pboros pboros deleted the pboros/simple-retry-status branch January 23, 2026 14:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds simple retry logic for status requests to handle transient failures when fetching database status information. The changes extract the status fetching logic into a separate function and implement a single retry attempt with a 1-second delay between attempts.

Changes:

  • Reduced metric collection interval from 10 seconds to 4 seconds
  • Added retry logic for status requests with one retry attempt after a 1-second delay
  • Extracted status incompleteness check into a reusable helper function

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
metrics/metric_provider.go Reduced collection interval and removed TODO comments
db/conn.go Added retry logic for status requests and extracted incompleteness check into a helper function

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread db/conn.go
Str("status_json", statusString).
Msg("status json is missing cluster fields")

return nil, fmt.Errorf("incomplete status")
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This early return inside the DEBUG_LOG_INCOMPLETE_STATUS check means incomplete status is only treated as an error when debug logging is enabled. When the environment variable is not set to 'true', the function proceeds to return the incomplete status without error, which is inconsistent behavior. The incomplete status check and error return should be outside the debug logging conditional block.

Copilot uses AI. Check for mistakes.
Comment thread db/conn.go
status, err := getStatusOnce()
if err != nil {
log.Error().Err(err).Msg("failed to get status, retrying")
time.Sleep(1 * time.Second)
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry delay of 1 second is a magic number hardcoded in the implementation. Consider extracting this as a named constant or configurable parameter to improve maintainability and make it easier to adjust the retry behavior.

Copilot uses AI. Check for mistakes.
Comment thread db/conn.go
Comment on lines +152 to +163
func GetStatus() (*models.FullStatus, error) {
status, err := getStatusOnce()
if err != nil {
log.Error().Err(err).Msg("failed to get status, retrying")
time.Sleep(1 * time.Second)
status, err = getStatusOnce()
if err != nil {
log.Error().Err(err).Msg("failed to get status, retry failed")
}
}
return status, err
}
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new retry logic in GetStatus lacks test coverage. Tests should verify the retry behavior when getStatusOnce fails on the first attempt but succeeds on retry, and when both attempts fail.

Copilot uses AI. Check for mistakes.
func (mp *MetricProvider) Collect() {
// TODO make this configurable
interval := 10 * time.Second
interval := 4 * time.Second
Copy link

Copilot AI Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The collection interval of 4 seconds is a magic number. While this replaces a previous TODO about making it configurable, consider implementing that configurability or at least extracting this as a named constant to document why this specific interval was chosen.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants