Apify 请求之间的延迟

Delay between requests in Apify

Apify's legacy Crawler 有一个 randomWaitBetweenRequests 选项:

This option forces the crawler to ensure a minimum time interval between opening two web pages, in order to prevent it from overloading the target server.

Apify Actors 是否有类似的设置?如果是这样,它如何影响 Actor Units 计算?

apify/web-scraper 中没有类似的选项,应该替换旧版抓取工具选项。

但是有一种方法可以在 pageFunction 中自行实现。您可以简单地使用 context.waitFor() 函数并以毫秒为单位传递随机时间。

async function pageFunction(context) {
    const { request, log, jQuery } = context;

    // To be able to use jQuery as $, one needs save it into a variable
    // and select the inject jQuery option. We've selected it for you.
    const $ = jQuery;
    const title = $('title').text();

    log.info(`URL: ${request.url} TITLE: ${title}`);

    // This waits time in ms, which getRandomWait returns.
    await context.waitFor(getRandomWait());

    // To save data just return an object with the requested properties.
    return {
        url: request.url,
        title
    };
}

如果您想在 apify/web-scraper 中使用此选项,您可以在 GitHub repo 上提交问题。