Skip to content

Add exponential backoff for HTTP 429 rate limiting in scrapers#6731

Open
priv-r8s wants to merge 2 commits intostashapp:developfrom
priv-r8s:develop
Open

Add exponential backoff for HTTP 429 rate limiting in scrapers#6731
priv-r8s wants to merge 2 commits intostashapp:developfrom
priv-r8s:develop

Conversation

@priv-r8s
Copy link
Copy Markdown

Summary

  • Add retry logic with exponential backoff when scrapers receive HTTP 429 (Too Many Requests)
  • Backoff delay = Retry-After + exponential (2s, 4s, 8s, ...) so the server's requested wait is always respected
  • If Retry-After exceeds 60s max, give up immediately instead of waiting excessively
  • Parse Retry-After header (both seconds and HTTP-date formats) per RFC 9110
  • 5-minute total timeout prevents rate-limit retries from running indefinitely
  • Comprehensive unit tests for all backoff paths

Test plan

  • Unit tests: exponential backoff without Retry-After header
  • Unit tests: Retry-After + exponential backoff (additive)
  • Unit tests: Retry-After via HTTP-date format
  • Unit tests: Retry-After exceeds max → immediate give-up
  • Unit tests: loadURL retry-and-succeed after 429s
  • Unit tests: loadURL retry exhaustion
  • Unit tests: context cancellation during retries
  • Live tested against r18.dev API — confirmed backoff works with real 429 responses (Retry-After: 10)

Replaces #6722 (closed due to fork rebuild).

priv-r8s and others added 2 commits March 21, 2026 15:23
- Backoff delay = Retry-After + exponential (2s, 4s, 8s, ...)
- If Retry-After exceeds 60s max, give up immediately
- Respects Retry-After header as floor, adds incremental backoff
- Comprehensive unit tests for all backoff paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tness

- rateLimitBackoff returns (time.Duration, bool) instead of sentinel -1
- Use errors.As instead of direct type assertion for HTTPError
- TestLoadURL_429ExhaustsRetries now actually tests retry exhaustion path
  (asserts *HTTPError with status 429 and correct attempt count)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@Gykes Gykes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, after a brief look over the biggest thing that's nagging me is the hard coded values. Each site is different and has more or less strict rules based on scraping. Hard coding these could increase wait times dramatically depending on the situation. I see 2 possible options and could use some feedback on best solutions.

  1. Change this up to allow individual scrapers to pass the values:
  • rate_limit_retries. Allow scrapers to pass this individually per scraper. It would require all existing scrapers to be updated and pass these new values but would allow user flexibility depending on site.
  1. Set them as variables and add these settings to the scrapers settings in the app to allow users to modify them to what they see fit. We would need to find sane default for these but it could allow users to do their own testing with sites and find optimal settings.

Would be best to wait for some feedback from others before implementing any of these.

@Gykes Gykes added the deferred Good feature that can be looked at for a later release. label Mar 22, 2026
@feederbox826
Copy link
Copy Markdown
Collaborator

Off the top of my head, not sure how helpful this would be. I haven't encountered one yml/json scraper that returns a 429 properly as a ratelimit backoff mechanism.

  • redgifs (needs anon auth token so need to be py)
  • nubiles (fake 429)
  • TPDB (real 429, handled through stash-box concurrency limit)

Other than that I'm not sure how this would apply for bulk scrapes, would we need a global cooldown that freezes all http requests or is this better suited just as #2914

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deferred Good feature that can be looked at for a later release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants