fix: Prevent accidental request dropping with `maxRequestsPerCrawl` by janbuchar · Pull Request #3531 · apify/crawlee

janbuchar · 2026-03-26T17:25:48Z

closes maxRequestsPerCrawl with RQ optimizations drops requests #3153
core issue: maxRequestsPerCrawl and enqueueLinks({ limit }) pre-truncated request lists before sending them to the request queue. Duplicate URLs consumed budget slots, starving actual new URLs.
Fix: Instead of guessing upfront which requests will be new, let the request queue tell us.
- A new maxNewRequests option on addRequestsBatched feeds requests to the queue in budget-capped chunks, counts how many were actually new from the wasAlreadyPresent results, and stops pulling once the budget is exhausted.
- chunkedAsyncIterable was extended to accept a dynamic size callback
- Leftovers from the source iterator are returned as requestsOverLimit for callers to report.

barjin

Here are some late-night ideas I got (otherwise it looks pretty solid):

barjin · 2026-03-27T21:34:05Z

packages/core/src/storages/request_provider.ts

+     *
+     * This is useful in combination with `maxRequestsPerCrawl` to avoid duplicate URLs consuming the budget.
+     */
+    maxNewRequests?: number;


Let's maybe mention that this effectively forces waitForAllRequestsToBeAdded as per this line?

barjin · 2026-03-27T21:44:34Z

packages/core/src/storages/request_provider.ts

+            waitForAllRequestsToBeAdded: Promise<ProcessedRequest[]>,
+        ): Promise<AddRequestsBatchedResult> => {
+            if (maxNewRequests !== undefined) {
+                for await (const request of generateRequests()) {


I feel like this generateRequests() call will create a new generator instance (and therefore push all the requests to requestsOverLimit, regardless of which ones were processed).

const result = await rq.addRequestsBatched([...e.g. 5 urls...], { maxNewRequests: 1, }); // result.addedRequests.length === 1 // result.requestsOverLimit.length === 5 (should be 4?)

Can we have a test for this?

On a side note, the intent of sharing the state through a common closure variable (the iterator) took me some time to understand. Can we pass the half-eaten iterator explicitly as a param?

Reproduce the error

e187f2c

janbuchar added the t-tooling Issues with this label are in the ownership of the tooling team. label Mar 26, 2026

janbuchar requested review from B4nan, barjin and l2ysho March 26, 2026 17:25

github-actions bot assigned janbuchar Mar 26, 2026

github-actions bot added this to the 137th sprint - Tooling team milestone Mar 26, 2026

github-actions bot added the tested Temporary label used only programatically for some analytics. label Mar 26, 2026

janbuchar added 2 commits March 27, 2026 20:44

Implement dynamic chunk sizes in chunkedAsyncIterable

136d4e1

Apply the limit on added requests in RequestProvider

a50afb6

janbuchar marked this pull request as ready for review March 27, 2026 19:49

barjin reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Prevent accidental request dropping with `maxRequestsPerCrawl`#3531

fix: Prevent accidental request dropping with `maxRequestsPerCrawl`#3531
janbuchar wants to merge 3 commits intomasterfrom
fix-accidental-request-dropping

janbuchar commented Mar 26, 2026 •

edited

Loading

Uh oh!

barjin left a comment

Uh oh!

barjin Mar 27, 2026

Uh oh!

barjin Mar 27, 2026

Uh oh!

barjin Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

janbuchar commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

barjin left a comment

Choose a reason for hiding this comment

Uh oh!

barjin Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

barjin Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

barjin Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

janbuchar commented Mar 26, 2026 •

edited

Loading