nutch 1.13 中 fetcher.server.min.delay 和 fetcher.threads.fetch 之间的关系

Question

我运行在本地模式下发疯，服务器配置为 64 GB RAM 和 32 processor.if 我在种子列表中有一个 url 并且在 [=19= 中有以下配置]

fetcher.threads.fetch =16
fetcher.threads.per.queue=2
fetcher.max.crawl.delay=120
fetcher.queue.depth.multiplier=150
fetcher.queue.mode=byHost

如果 -topN 设置为 1000，在 Fetch 阶段将向 url 发出多少请求将为 Fetcher 创建多个地图任务，我的理解是创建单个地图任务，而不管需要从 fetchlist 中获取的 urls 的数量我试着用谷歌搜索 fetcher.threads.fetch 和 fetcher.threads.per.queue 之间的关系，但没有找到任何明确的东西还从 fetcher Phase

添加日志

FetcherThread INFO  fetcher.FetcherThread (277) - fetching 
http://investors.te.com/news-releases/press-release-details/2018/TE- 
Connectivity-announces-fourth-quarter-and-full-year-resu
lts-for-fiscal-year-2018/default.aspx (queue crawl delay=2000ms)
FetcherThread INFO  fetcher.FetcherThread (277) - fetching http://investors.te.com/shareholder-info/default.aspx (queue crawl delay=2000ms)
FetcherThread INFO  fetcher.FetcherThread (277) - fetching https://investors.te.com/news-releases/press-release-details/2019/TE-Connectivity-to-hold-annual-general-meeting-of-shareholders-on-March-13-2019/default.aspx (queue crawl delay=2000ms)
FetcherThread INFO  fetcher.FetcherThread (277) - fetching https://investors.te.com/investor-resources/request-information/default.aspx (queue crawl delay=2000ms)
FetcherThread INFO  fetcher.FetcherThread (277) - fetching https://investors.te.com/investor-resources/email-alerts/default.aspx (queue crawl delay=10000ms)
FetcherThread INFO  fetcher.FetcherThread (277) - fetching https://investors.te.com/site-map/default.aspx (queue crawl delay=10000ms)
FetcherThread INFO  fetcher.FetcherThread (277) - fetching https://investors.te.com/rss/PressRelease.aspx?LanguageId=1&CategoryWorkflowId=00000000-0000-0000-0000-000000000000&tags= (queue crawl delay=10000ms)
FetcherThread INFO  fetcher.FetcherThread (277) - fetching https://investors.te.com/stock-information/quote-and-chart/default.aspx (queue crawl delay=10000ms)
FetcherThread INFO  fetcher.FetcherThread (277) - fetching https://investors.te.com/investor-resources/overview/default.aspx (queue crawl delay=10000ms)
FetcherThread INFO  fetcher.FetcherThread (277) - fetching https://investors.te.com/investor-resources/investor-contacts/default.aspx (queue crawl delay=10000ms)
FetcherThread INFO  fetcher.FetcherThread (277) - fetching https://investors.te.com/js/mobileRedirect.js (queue crawl delay=10000ms)

Answer 1

只有一个请求，因为只有一个 URL。如果有两个 URL 来自具有 fetcher.threads.per.queue=2 的单个主机，则可以同时向同一主机发出两个请求。大量 fetcher.threads.fetch 仅在您要抓取大量主机或您正在抓取自己的本地快速响应网络服务器时才有意义。在后一种情况下，fetcher.threads.per.queue 应该等于或接近 fetcher.threads.fetch。如果它不是你自己的服务器并且你没有被明确允许，你应该始终保持 fetcher.threads.per.queue 的默认值，这是一个单线程（=1），没有到同一主机的并行连接，并且连续请求之间有保证的延迟。

nutch 1.13 中 fetcher.server.min.delay 和 fetcher.threads.fetch 之间的关系

relation between fetcher.server.min.delay and fetcher.threads.fetch in nutch 1.13

nutch