Apify 请求之间的延迟
Delay between requests in Apify
Apify's legacy Crawler 有一个 randomWaitBetweenRequests
选项:
This option forces the crawler to ensure a minimum time interval between opening two web pages, in order to prevent it from overloading the target server.
Apify Actors 是否有类似的设置?如果是这样,它如何影响 Actor Units 计算?
apify/web-scraper 中没有类似的选项,应该替换旧版抓取工具选项。
但是有一种方法可以在 pageFunction 中自行实现。您可以简单地使用 context.waitFor() 函数并以毫秒为单位传递随机时间。
async function pageFunction(context) {
const { request, log, jQuery } = context;
// To be able to use jQuery as $, one needs save it into a variable
// and select the inject jQuery option. We've selected it for you.
const $ = jQuery;
const title = $('title').text();
log.info(`URL: ${request.url} TITLE: ${title}`);
// This waits time in ms, which getRandomWait returns.
await context.waitFor(getRandomWait());
// To save data just return an object with the requested properties.
return {
url: request.url,
title
};
}
如果您想在 apify/web-scraper 中使用此选项,您可以在 GitHub repo 上提交问题。
Apify's legacy Crawler 有一个 randomWaitBetweenRequests
选项:
This option forces the crawler to ensure a minimum time interval between opening two web pages, in order to prevent it from overloading the target server.
Apify Actors 是否有类似的设置?如果是这样,它如何影响 Actor Units 计算?
apify/web-scraper 中没有类似的选项,应该替换旧版抓取工具选项。
但是有一种方法可以在 pageFunction 中自行实现。您可以简单地使用 context.waitFor() 函数并以毫秒为单位传递随机时间。
async function pageFunction(context) {
const { request, log, jQuery } = context;
// To be able to use jQuery as $, one needs save it into a variable
// and select the inject jQuery option. We've selected it for you.
const $ = jQuery;
const title = $('title').text();
log.info(`URL: ${request.url} TITLE: ${title}`);
// This waits time in ms, which getRandomWait returns.
await context.waitFor(getRandomWait());
// To save data just return an object with the requested properties.
return {
url: request.url,
title
};
}
如果您想在 apify/web-scraper 中使用此选项,您可以在 GitHub repo 上提交问题。