我有一个指向 www 的 302 重定向。但 Googlebot 一直在抓取非 www 网址

Question

您知道是否可以强制机器人在 www.domaine.com 而不是 domaine.com 上爬行吗？在我的例子中，我有一个网络应用程序启用了 prerender.io 的缓存 url（以查看 HTML 代码），但仅限于 www.

所以，当机器人在 domaine.com 上爬行时，它没有数据。

在 Nginx 上重定向是自动的 (domaine.com> http://www.domaine.com)，但没有结果。

我说我在我的站点地图上，url 都是 www.

我的 Nginx 重定向：

server {
  listen                *:80;

  server_name           stephane-richin.fr;

  location / {

    if ($http_host ~ "^([^\.]+)\.([^\.]+)$"){
      rewrite ^/(.*) http://www.stephane-richin.fr/ redirect;
    }

  }
}

你有什么想法吗？

谢谢！

Answer 1

你能有一个 robots.txt 文件

User-agent: *
Disallow: /

在 domaine.com 上，另一个在

User-agent: *
Disallow:

在 www.domaine.com?

Answer 2

如果您在一周前提交了具有正确 URL 的站点地图，Google 一直请求旧的似乎很奇怪。

无论如何 - 您在非 www 到 www 重定向中发送了错误的状态代码。您正在发送 302，但应该发送 301。Philippe 在 this answer 中解释了差异：

Status 301 means that the resource (page) is moved permanently to a new location. The client/browser should not attempt to request the original location but use the new location from now on.

Status 302 means that the resource is temporarily located somewhere else, and the client/browser should continue requesting the original url.

我有一个指向 www 的 302 重定向。但 Googlebot 一直在抓取非 www 网址

I have a 302 redirect pointing to www. but Googlebot keeps crawling non-www URLs

seo

web-crawler

google-crawlers

domcrawler