Add exponential backoff for HTTP 429 rate limiting in scrapers#6731
Add exponential backoff for HTTP 429 rate limiting in scrapers#6731priv-r8s wants to merge 2 commits intostashapp:developfrom
Conversation
- Backoff delay = Retry-After + exponential (2s, 4s, 8s, ...) - If Retry-After exceeds 60s max, give up immediately - Respects Retry-After header as floor, adds incremental backoff - Comprehensive unit tests for all backoff paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tness - rateLimitBackoff returns (time.Duration, bool) instead of sentinel -1 - Use errors.As instead of direct type assertion for HTTPError - TestLoadURL_429ExhaustsRetries now actually tests retry exhaustion path (asserts *HTTPError with status 429 and correct attempt count) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
So, after a brief look over the biggest thing that's nagging me is the hard coded values. Each site is different and has more or less strict rules based on scraping. Hard coding these could increase wait times dramatically depending on the situation. I see 2 possible options and could use some feedback on best solutions.
- Change this up to allow individual scrapers to pass the values:
rate_limit_retries. Allow scrapers to pass this individually per scraper. It would require all existing scrapers to be updated and pass these new values but would allow user flexibility depending on site.
- Set them as variables and add these settings to the scrapers settings in the app to allow users to modify them to what they see fit. We would need to find sane default for these but it could allow users to do their own testing with sites and find optimal settings.
Would be best to wait for some feedback from others before implementing any of these.
|
Off the top of my head, not sure how helpful this would be. I haven't encountered one yml/json scraper that returns a 429 properly as a ratelimit backoff mechanism.
Other than that I'm not sure how this would apply for bulk scrapes, would we need a global cooldown that freezes all http requests or is this better suited just as #2914 |
Summary
Test plan
Replaces #6722 (closed due to fork rebuild).