Googlebot 无法识别动态 robots.txt

Question

我已经使用 laravel 创建了一个动态路由，用于提供 txt 响应。

它可以在浏览器上运行，但是 googlebot 说没有 robots.txt 文件。

这是我得到的header：

Cache-Control →no-cache Connection →keep-alive Content-Disposition →inline; filename="robots.txt" Content-Encoding →gzip Content-Type →text/plain; charset=UTF-8 Date →Wed, 23 Mar 2016 11:36:44 GMT Server →nginx/1.9.12 Transfer-Encoding →chunked Vary →Accept-Encoding

这是我的 laravel 路线：

Route::get('robots.txt', 'TxtController@robots');

这是方法：

public function robots(){ return response()->view('txt.robots')->header('Content-Type', 'text/plain')->header('Content-Disposition', 'inline; filename="robots.txt"'); }

我试过 Content-Disposition →attachment; filename="robots.txt" 但 google 一直说没有 robots.txt 文件。

我已经尝试删除 Content-Disposition 但仍然无法从 Google Web Master Tools（它可以在浏览器上运行）

这是我的nginx配置，可能这里有问题：

```

server {
listen 80 default_server;
listen [::]:80 default_server;
server_name mydomain.com;
root /home/forge/mydomain.com/public;

# FORGE SSL (DO NOT REMOVE!)
# ssl_certificate;
# ssl_certificate_key;

ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

index index.html index.htm index.php;

charset utf-8;



location / {
    try_files $uri $uri/ /index.php?$query_string;
}

location = /favicon.ico { access_log off; log_not_found off; }
#location = /robots.txt  { access_log off; log_not_found off; }

#location = /robots.txt {
#    try_files $uri $uri/ /index.php?$args;
#    access_log off;
#    log_not_found off;
#}

access_log off;
error_log  /var/log/nginx/mydomain.com-error.log error;

error_page 404 /index.php;

location ~ \.php$ {
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    fastcgi_pass unix:/var/run/php5-fpm.sock;
    fastcgi_index index.php;
    include fastcgi_params;
}

location ~ /\.ht {
    deny all;
}


# Expire rules for static content

# cache.appcache, your document html and data
location ~* \.(?:manifest|appcache|html?|xml|json)$ {
    expires -1;
    # access_log logs/static.log; # I don't usually include a static log
}

# Feed
location ~* \.(?:rss|atom)$ {
    expires 1h;
    add_header Cache-Control "public";
}

# Media: images, icons, video, audio, HTC
location ~* \.(?:jpg|jpeg|gif|png|ico|cur|gz|svg|svgz|mp4|ogg|ogv|webm|htc)$ {
    expires 1M;
    access_log off;
    add_header Cache-Control "public";
}

# CSS, Javascript and Fonts
location ~* \.(?:css|js|woff|ttf|eot)$ {
    expires 1y;
    access_log off;
    add_header Cache-Control "public";
}
}
```

谢谢。

Answer 1

当我检查 http://www.google.com/robots.txt 时，HTTP 响应 header 是：

Cache-Control:private, max-age=0
Content-Encoding:gzip
Content-Length:1574
Content-Type:text/plain
Date:Wed, 23 Mar 2016 12:07:44 GMT
Expires:Wed, 23 Mar 2016 12:07:44 GMT
Last-Modified:Fri, 04 Mar 2016 19:02:51 GMT
Server:sffe
Vary:Accept-Encoding
X-Content-Type-Options:nosniff
X-XSS-Protection:1; mode=block

为什么不跳过 Content-Disposition header 并只输出带有 Content-Type:text/plain header 的文本？

还有...

您确定您的 robots.txt url 可以从外界获得吗？也许使用代理来仔细检查。
你的输出是 UTF-8 编码的吗？

有关详细信息，请参阅 https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt

Answer 2

Content-Disposition header 用于在浏览器中强制下载文件。它可能会混淆 Google 机器人 - 尝试在没有它的情况下提供文件：

public function robots(){
    return response()->view('txt.robots')->header('Content-Type', 'text/plain');
}

Answer 3

我加了一个Content-lengthheader就解决了。代码结果是这样的：

    $response = response()->view('txt.robots')->header('Content-Type', 'text/plain');
    $response->header('Content-Length',strlen($response->getOriginalContent()));

    return $response;

希望对您有所帮助。感谢您的回复。

Googlebot 无法识别动态 robots.txt

Googlebot not recognizing dynamic robots.txt

php

robots.txt

googlebot

nginx

laravel