为什么需要正向DNS来验证爬虫

Why is the forward DNS needed to verify crawlers

来自 google 的支持站点 -

To verify Googlebot as the caller:

Run a reverse DNS lookup on the accessing IP address from your logs, using the host command. Verify that the domain name is in either googlebot.com or google.com Run a forward DNS lookup on the domain name retrieved in step 1 using the host command on the retrieved domain name. Verify that it is the same as the original accessing IP address from your logs.

我的问题是为什么需要正向 DNS 查找？攻击者能否创建 crawl-xx-xx-xx-xx.googlebot.com?

形式的 DNS 记录

我实际上在我的日志中看到了这一点 - 来自其他爬虫。 Ip 如果我反向 DNS 查找来自正确的域，但正向查找不会 return IP。想知道这怎么可能..

任何人都可以服务于反向区域。如果您拥有 IP space，并让您的 isp 转发反向查找，您可以提供指向任何您想要的反向区域。

作为攻击者，我可以购买任何 IP 块并为我的区域 4.3.2.1.in-addr.arpa 提供服务，该区域表示所有记录都在 crawl-xx-xx-xx-xx.googlebot.com

我无法控制 google 那个区域的转发 dns。因此，即使我可以获得 1.2.3.4 到 return crawl-12-34-56-78.googlebot.com 的反向查找，我也无法获得 crawl-12-34-56-78.googlebot.com 到 return 1.2.3.4 的正向查找.

您日志中的不一致条目几乎可以肯定是第三方机器人试图（相当不错地）冒充 google。

为什么需要正向DNS来验证爬虫

Why is the forward DNS needed to verify crawlers

dns

search-engine

web-crawler