feat: add simple retries for status requests#65
Conversation
Greptile SummaryAdded simple retry logic to status requests with a 1-second delay between attempts. The
Confidence Score: 3/5
Important Files Changed
|
| func (mp *MetricProvider) Collect() { | ||
| // TODO make this configurable | ||
| interval := 10 * time.Second | ||
| interval := 4 * time.Second |
There was a problem hiding this comment.
style: collection interval reduced from 10s to 4s - combined with the 1s retry delay in GetStatus(), overlapping collection attempts could occur if status requests consistently take >4s
| interval := 4 * time.Second | |
| interval := 6 * time.Second |
Prompt To Fix With AI
This is a comment left during a code review.
Path: metrics/metric_provider.go
Line: 58:58
Comment:
**style:** collection interval reduced from 10s to 4s - combined with the 1s retry delay in `GetStatus()`, overlapping collection attempts could occur if status requests consistently take >4s
```suggestion
interval := 6 * time.Second
```
How can I resolve this? If you propose a fix, please make it concise.| Str("status_json", statusString). | ||
| Msg("status json is missing cluster fields") | ||
|
|
||
| return nil, fmt.Errorf("incomplete status") |
There was a problem hiding this comment.
style: incomplete status now returns error when DEBUG_LOG_INCOMPLETE_STATUS is set, which triggers retry - verify this behavior change is intentional since previously it would return the incomplete status without error. was the previous behavior (returning incomplete status without error) a bug, or is this new retry behavior a breaking change?
Prompt To Fix With AI
This is a comment left during a code review.
Path: db/conn.go
Line: 146:146
Comment:
**style:** incomplete status now returns error when `DEBUG_LOG_INCOMPLETE_STATUS` is set, which triggers retry - verify this behavior change is intentional since previously it would return the incomplete status without error. was the previous behavior (returning incomplete status without error) a bug, or is this new retry behavior a breaking change?
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Pull request overview
This PR adds simple retry logic for status requests to handle transient failures when fetching database status information. The changes extract the status fetching logic into a separate function and implement a single retry attempt with a 1-second delay between attempts.
Changes:
- Reduced metric collection interval from 10 seconds to 4 seconds
- Added retry logic for status requests with one retry attempt after a 1-second delay
- Extracted status incompleteness check into a reusable helper function
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| metrics/metric_provider.go | Reduced collection interval and removed TODO comments |
| db/conn.go | Added retry logic for status requests and extracted incompleteness check into a helper function |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Str("status_json", statusString). | ||
| Msg("status json is missing cluster fields") | ||
|
|
||
| return nil, fmt.Errorf("incomplete status") |
There was a problem hiding this comment.
This early return inside the DEBUG_LOG_INCOMPLETE_STATUS check means incomplete status is only treated as an error when debug logging is enabled. When the environment variable is not set to 'true', the function proceeds to return the incomplete status without error, which is inconsistent behavior. The incomplete status check and error return should be outside the debug logging conditional block.
| status, err := getStatusOnce() | ||
| if err != nil { | ||
| log.Error().Err(err).Msg("failed to get status, retrying") | ||
| time.Sleep(1 * time.Second) |
There was a problem hiding this comment.
The retry delay of 1 second is a magic number hardcoded in the implementation. Consider extracting this as a named constant or configurable parameter to improve maintainability and make it easier to adjust the retry behavior.
| func GetStatus() (*models.FullStatus, error) { | ||
| status, err := getStatusOnce() | ||
| if err != nil { | ||
| log.Error().Err(err).Msg("failed to get status, retrying") | ||
| time.Sleep(1 * time.Second) | ||
| status, err = getStatusOnce() | ||
| if err != nil { | ||
| log.Error().Err(err).Msg("failed to get status, retry failed") | ||
| } | ||
| } | ||
| return status, err | ||
| } |
There was a problem hiding this comment.
The new retry logic in GetStatus lacks test coverage. Tests should verify the retry behavior when getStatusOnce fails on the first attempt but succeeds on retry, and when both attempts fail.
| func (mp *MetricProvider) Collect() { | ||
| // TODO make this configurable | ||
| interval := 10 * time.Second | ||
| interval := 4 * time.Second |
There was a problem hiding this comment.
The collection interval of 4 seconds is a magic number. While this replaces a previous TODO about making it configurable, consider implementing that configurability or at least extracting this as a named constant to document why this specific interval was chosen.
No description provided.