Googlebot 无法识别动态 robots.txt

Googlebot not recognizing dynamic robots.txt

我已经使用 laravel 创建了一个动态路由,用于提供 txt 响应。

它可以在浏览器上运行,但是 googlebot 说没有 robots.txt 文件。

这是我得到的header:

Cache-Control →no-cache Connection →keep-alive Content-Disposition →inline; filename="robots.txt" Content-Encoding →gzip Content-Type →text/plain; charset=UTF-8 Date →Wed, 23 Mar 2016 11:36:44 GMT Server →nginx/1.9.12 Transfer-Encoding →chunked Vary →Accept-Encoding

这是我的 laravel 路线:

Route::get('robots.txt', 'TxtController@robots');

这是方法:

public function robots(){ return response()->view('txt.robots')->header('Content-Type', 'text/plain')->header('Content-Disposition', 'inline; filename="robots.txt"'); }

我试过 Content-Disposition →attachment; filename="robots.txt" 但 google 一直说没有 robots.txt 文件。

我已经尝试删除 Content-Disposition 但仍然无法从 Google Web Master Tools(它可以在浏览器上运行)

这是我的nginx配置,可能这里有问题:

```

server {
listen 80 default_server;
listen [::]:80 default_server;
server_name mydomain.com;
root /home/forge/mydomain.com/public;

# FORGE SSL (DO NOT REMOVE!)
# ssl_certificate;
# ssl_certificate_key;

ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

index index.html index.htm index.php;

charset utf-8;



location / {
    try_files $uri $uri/ /index.php?$query_string;
}

location = /favicon.ico { access_log off; log_not_found off; }
#location = /robots.txt  { access_log off; log_not_found off; }

#location = /robots.txt {
#    try_files $uri $uri/ /index.php?$args;
#    access_log off;
#    log_not_found off;
#}

access_log off;
error_log  /var/log/nginx/mydomain.com-error.log error;

error_page 404 /index.php;

location ~ \.php$ {
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    fastcgi_pass unix:/var/run/php5-fpm.sock;
    fastcgi_index index.php;
    include fastcgi_params;
}

location ~ /\.ht {
    deny all;
}


# Expire rules for static content

# cache.appcache, your document html and data
location ~* \.(?:manifest|appcache|html?|xml|json)$ {
    expires -1;
    # access_log logs/static.log; # I don't usually include a static log
}

# Feed
location ~* \.(?:rss|atom)$ {
    expires 1h;
    add_header Cache-Control "public";
}

# Media: images, icons, video, audio, HTC
location ~* \.(?:jpg|jpeg|gif|png|ico|cur|gz|svg|svgz|mp4|ogg|ogv|webm|htc)$ {
    expires 1M;
    access_log off;
    add_header Cache-Control "public";
}

# CSS, Javascript and Fonts
location ~* \.(?:css|js|woff|ttf|eot)$ {
    expires 1y;
    access_log off;
    add_header Cache-Control "public";
}
}
```

谢谢。

当我检查 http://www.google.com/robots.txt 时,HTTP 响应 header 是:

Cache-Control:private, max-age=0
Content-Encoding:gzip
Content-Length:1574
Content-Type:text/plain
Date:Wed, 23 Mar 2016 12:07:44 GMT
Expires:Wed, 23 Mar 2016 12:07:44 GMT
Last-Modified:Fri, 04 Mar 2016 19:02:51 GMT
Server:sffe
Vary:Accept-Encoding
X-Content-Type-Options:nosniff
X-XSS-Protection:1; mode=block

为什么不跳过 Content-Disposition header 并只输出带有 Content-Type:text/plain header 的文本?

还有...

  • 您确定您的 robots.txt url 可以从外界获得吗?也许使用代理来仔细检查。
  • 你的输出是 UTF-8 编码的吗?

有关详细信息,请参阅 https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt

Content-Disposition header 用于在浏览器中强制下载文件。它可能会混淆 Google 机器人 - 尝试在没有它的情况下提供文件:

public function robots(){
    return response()->view('txt.robots')->header('Content-Type', 'text/plain');
}

我加了一个Content-lengthheader就解决了。代码结果是这样的:

    $response = response()->view('txt.robots')->header('Content-Type', 'text/plain');
    $response->header('Content-Length',strlen($response->getOriginalContent()));

    return $response;

希望对您有所帮助。感谢您的回复。