Google robots.txt 用于重定向到 https 后的 http 站点

Google robots.txt for http site after redirection to https

Google Robots.txt Specification states that a robots txt URL http://example.com/robots.txt is not valid for domain https://example.com。想必反过来也是如此。

在请求 robots.txt:

时，它还有关于跟随重定向的说法

3xx (redirection)

Redirects will generally be followed until a valid result can be found (or a loop is recognized). We will follow a limited number of redirect hops (RFC 1945 for HTTP/1.0 allows up to 5 hops) and then stop and treat it as a 404. Handling of robots.txt redirects to disallowed URLs is undefined and discouraged. Handling of logical redirects for the robots.txt file based on HTML content that returns 2xx (frames, JavaScript, or meta refresh-type redirects) is undefined and discouraged.

假设我建立了一个网站，以便 http 上的所有请求都永久重定向到 https 上的等效请求。 Google 将请求 http://example.com/robots.txt and follow the redirect to https://example.com/robots.txt。该文件是 http 站点的有效 robots.txt，因为那是原始请求，还是 Google 认为 http 站点没有有效的 robots.txt？

在 Google Search Console 中使用 robots.txt 测试器确认重定向的 robots.txt 被用作 http（原始）域的漫游器文件。

站长中心论坛Barry Hunter提供的答案： https://productforums.google.com/forum/#!topic/webmasters/LLDVaso5QP8