fix: Prevent accidental request dropping with maxRequestsPerCrawl#3531
Open
fix: Prevent accidental request dropping with maxRequestsPerCrawl#3531
maxRequestsPerCrawl#3531Conversation
barjin
reviewed
Mar 27, 2026
Member
barjin
left a comment
There was a problem hiding this comment.
Thank you @janbuchar !
Here are some late-night ideas I got (otherwise it looks pretty solid):
| * | ||
| * This is useful in combination with `maxRequestsPerCrawl` to avoid duplicate URLs consuming the budget. | ||
| */ | ||
| maxNewRequests?: number; |
Member
There was a problem hiding this comment.
Let's maybe mention that this effectively forces waitForAllRequestsToBeAdded as per this line?
| waitForAllRequestsToBeAdded: Promise<ProcessedRequest[]>, | ||
| ): Promise<AddRequestsBatchedResult> => { | ||
| if (maxNewRequests !== undefined) { | ||
| for await (const request of generateRequests()) { |
Member
There was a problem hiding this comment.
I feel like this generateRequests() call will create a new generator instance (and therefore push all the requests to requestsOverLimit, regardless of which ones were processed).
const result = await rq.addRequestsBatched([...e.g. 5 urls...], {
maxNewRequests: 1,
});
// result.addedRequests.length === 1
// result.requestsOverLimit.length === 5 (should be 4?)Can we have a test for this?
Member
There was a problem hiding this comment.
On a side note, the intent of sharing the state through a common closure variable (the iterator) took me some time to understand. Can we pass the half-eaten iterator explicitly as a param?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
closes
maxRequestsPerCrawlwith RQ optimizations drops requests #3153core issue:
maxRequestsPerCrawlandenqueueLinks({ limit })pre-truncated request lists before sending them to the request queue. Duplicate URLs consumed budget slots, starving actual new URLs.Fix: Instead of guessing upfront which requests will be new, let the request queue tell us.
maxNewRequestsoption onaddRequestsBatchedfeeds requests to the queue in budget-capped chunks, counts how many were actually new from thewasAlreadyPresentresults, and stops pulling once the budget is exhausted.chunkedAsyncIterablewas extended to accept a dynamic size callbackrequestsOverLimitfor callers to report.