Googlebot 无法识别动态 robots.txt
Googlebot not recognizing dynamic robots.txt
我已经使用 laravel 创建了一个动态路由,用于提供 txt 响应。
它可以在浏览器上运行,但是 googlebot 说没有 robots.txt
文件。
这是我得到的header:
Cache-Control →no-cache
Connection →keep-alive
Content-Disposition →inline; filename="robots.txt"
Content-Encoding →gzip
Content-Type →text/plain; charset=UTF-8
Date →Wed, 23 Mar 2016 11:36:44 GMT
Server →nginx/1.9.12
Transfer-Encoding →chunked
Vary →Accept-Encoding
这是我的 laravel 路线:
Route::get('robots.txt', 'TxtController@robots');
这是方法:
public function robots(){
return response()->view('txt.robots')->header('Content-Type', 'text/plain')->header('Content-Disposition', 'inline; filename="robots.txt"');
}
我试过 Content-Disposition →attachment; filename="robots.txt"
但 google 一直说没有 robots.txt
文件。
我已经尝试删除 Content-Disposition
但仍然无法从 Google Web Master Tools(它可以在浏览器上运行)
这是我的nginx配置,可能这里有问题:
```
server {
listen 80 default_server;
listen [::]:80 default_server;
server_name mydomain.com;
root /home/forge/mydomain.com/public;
# FORGE SSL (DO NOT REMOVE!)
# ssl_certificate;
# ssl_certificate_key;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
index index.html index.htm index.php;
charset utf-8;
location / {
try_files $uri $uri/ /index.php?$query_string;
}
location = /favicon.ico { access_log off; log_not_found off; }
#location = /robots.txt { access_log off; log_not_found off; }
#location = /robots.txt {
# try_files $uri $uri/ /index.php?$args;
# access_log off;
# log_not_found off;
#}
access_log off;
error_log /var/log/nginx/mydomain.com-error.log error;
error_page 404 /index.php;
location ~ \.php$ {
fastcgi_split_path_info ^(.+\.php)(/.+)$;
fastcgi_pass unix:/var/run/php5-fpm.sock;
fastcgi_index index.php;
include fastcgi_params;
}
location ~ /\.ht {
deny all;
}
# Expire rules for static content
# cache.appcache, your document html and data
location ~* \.(?:manifest|appcache|html?|xml|json)$ {
expires -1;
# access_log logs/static.log; # I don't usually include a static log
}
# Feed
location ~* \.(?:rss|atom)$ {
expires 1h;
add_header Cache-Control "public";
}
# Media: images, icons, video, audio, HTC
location ~* \.(?:jpg|jpeg|gif|png|ico|cur|gz|svg|svgz|mp4|ogg|ogv|webm|htc)$ {
expires 1M;
access_log off;
add_header Cache-Control "public";
}
# CSS, Javascript and Fonts
location ~* \.(?:css|js|woff|ttf|eot)$ {
expires 1y;
access_log off;
add_header Cache-Control "public";
}
}
```
谢谢。
当我检查 http://www.google.com/robots.txt 时,HTTP 响应 header 是:
Cache-Control:private, max-age=0
Content-Encoding:gzip
Content-Length:1574
Content-Type:text/plain
Date:Wed, 23 Mar 2016 12:07:44 GMT
Expires:Wed, 23 Mar 2016 12:07:44 GMT
Last-Modified:Fri, 04 Mar 2016 19:02:51 GMT
Server:sffe
Vary:Accept-Encoding
X-Content-Type-Options:nosniff
X-XSS-Protection:1; mode=block
为什么不跳过 Content-Disposition
header 并只输出带有 Content-Type:text/plain
header 的文本?
还有...
- 您确定您的 robots.txt url 可以从外界获得吗?也许使用代理来仔细检查。
- 你的输出是 UTF-8 编码的吗?
有关详细信息,请参阅 https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
Content-Disposition
header 用于在浏览器中强制下载文件。它可能会混淆 Google 机器人 - 尝试在没有它的情况下提供文件:
public function robots(){
return response()->view('txt.robots')->header('Content-Type', 'text/plain');
}
我加了一个Content-lengthheader就解决了。代码结果是这样的:
$response = response()->view('txt.robots')->header('Content-Type', 'text/plain');
$response->header('Content-Length',strlen($response->getOriginalContent()));
return $response;
希望对您有所帮助。感谢您的回复。
我已经使用 laravel 创建了一个动态路由,用于提供 txt 响应。
它可以在浏览器上运行,但是 googlebot 说没有 robots.txt
文件。
这是我得到的header:
Cache-Control →no-cache
Connection →keep-alive
Content-Disposition →inline; filename="robots.txt"
Content-Encoding →gzip
Content-Type →text/plain; charset=UTF-8
Date →Wed, 23 Mar 2016 11:36:44 GMT
Server →nginx/1.9.12
Transfer-Encoding →chunked
Vary →Accept-Encoding
这是我的 laravel 路线:
Route::get('robots.txt', 'TxtController@robots');
这是方法:
public function robots(){
return response()->view('txt.robots')->header('Content-Type', 'text/plain')->header('Content-Disposition', 'inline; filename="robots.txt"');
}
我试过 Content-Disposition →attachment; filename="robots.txt"
但 google 一直说没有 robots.txt
文件。
我已经尝试删除 Content-Disposition
但仍然无法从 Google Web Master Tools(它可以在浏览器上运行)
这是我的nginx配置,可能这里有问题:
```
server {
listen 80 default_server;
listen [::]:80 default_server;
server_name mydomain.com;
root /home/forge/mydomain.com/public;
# FORGE SSL (DO NOT REMOVE!)
# ssl_certificate;
# ssl_certificate_key;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
index index.html index.htm index.php;
charset utf-8;
location / {
try_files $uri $uri/ /index.php?$query_string;
}
location = /favicon.ico { access_log off; log_not_found off; }
#location = /robots.txt { access_log off; log_not_found off; }
#location = /robots.txt {
# try_files $uri $uri/ /index.php?$args;
# access_log off;
# log_not_found off;
#}
access_log off;
error_log /var/log/nginx/mydomain.com-error.log error;
error_page 404 /index.php;
location ~ \.php$ {
fastcgi_split_path_info ^(.+\.php)(/.+)$;
fastcgi_pass unix:/var/run/php5-fpm.sock;
fastcgi_index index.php;
include fastcgi_params;
}
location ~ /\.ht {
deny all;
}
# Expire rules for static content
# cache.appcache, your document html and data
location ~* \.(?:manifest|appcache|html?|xml|json)$ {
expires -1;
# access_log logs/static.log; # I don't usually include a static log
}
# Feed
location ~* \.(?:rss|atom)$ {
expires 1h;
add_header Cache-Control "public";
}
# Media: images, icons, video, audio, HTC
location ~* \.(?:jpg|jpeg|gif|png|ico|cur|gz|svg|svgz|mp4|ogg|ogv|webm|htc)$ {
expires 1M;
access_log off;
add_header Cache-Control "public";
}
# CSS, Javascript and Fonts
location ~* \.(?:css|js|woff|ttf|eot)$ {
expires 1y;
access_log off;
add_header Cache-Control "public";
}
}
```
谢谢。
当我检查 http://www.google.com/robots.txt 时,HTTP 响应 header 是:
Cache-Control:private, max-age=0
Content-Encoding:gzip
Content-Length:1574
Content-Type:text/plain
Date:Wed, 23 Mar 2016 12:07:44 GMT
Expires:Wed, 23 Mar 2016 12:07:44 GMT
Last-Modified:Fri, 04 Mar 2016 19:02:51 GMT
Server:sffe
Vary:Accept-Encoding
X-Content-Type-Options:nosniff
X-XSS-Protection:1; mode=block
为什么不跳过 Content-Disposition
header 并只输出带有 Content-Type:text/plain
header 的文本?
还有...
- 您确定您的 robots.txt url 可以从外界获得吗?也许使用代理来仔细检查。
- 你的输出是 UTF-8 编码的吗?
有关详细信息,请参阅 https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
Content-Disposition
header 用于在浏览器中强制下载文件。它可能会混淆 Google 机器人 - 尝试在没有它的情况下提供文件:
public function robots(){
return response()->view('txt.robots')->header('Content-Type', 'text/plain');
}
我加了一个Content-lengthheader就解决了。代码结果是这样的:
$response = response()->view('txt.robots')->header('Content-Type', 'text/plain');
$response->header('Content-Length',strlen($response->getOriginalContent()));
return $response;
希望对您有所帮助。感谢您的回复。