robots.txt 中令人困惑的通配符:*+*、*%2B*、*%2b*
Confusing wildcard in robots.txt: *+*, *%2B*, *%2b*
robots.txt 中的这 3 行是什么意思(显然,我指的是 *+*
、*%2B*
和 *%2b*
)?
Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
原来的"standard"只定义了
Disallow
The value of this field specifies a partial URL that is not
to be visited. This can be a full path, or a partial path; any URL
that starts with this value will not be retrieved. For example,
Disallow: /help
disallows both /help.html
and /help/index.html
,
whereas Disallow: /help/
would disallow /help/index.html
but allow
/help.html
.
这意味着,所有路径 字面匹配 (没有字符具有模式匹配中的特殊含义)。
但它也指出
It is not an official standard backed by a standards body, or owned by any commercial organisation. It is not enforced by anybody...
更现代的Google documentation解释
Google, Bing, Yahoo, and Ask support a limited form of "wildcards" for path values. These are:
*
designates 0 or more instances of any valid character.
$
designates the end of the URL.
所以
Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
将禁止所有以 /collections/
开头后跟任何包含
的路径
+
%2B
%2b
因为这些字符在路径模式中没有特殊含义。
robots.txt 中的这 3 行是什么意思(显然,我指的是 *+*
、*%2B*
和 *%2b*
)?
Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
原来的"standard"只定义了
Disallow
The value of this field specifies a partial URL that is not to be visited. This can be a full path, or a partial path; any URL that starts with this value will not be retrieved. For example,Disallow: /help
disallows both/help.html
and/help/index.html
, whereasDisallow: /help/
would disallow/help/index.html
but allow/help.html
.
这意味着,所有路径 字面匹配 (没有字符具有模式匹配中的特殊含义)。
但它也指出
It is not an official standard backed by a standards body, or owned by any commercial organisation. It is not enforced by anybody...
更现代的Google documentation解释
Google, Bing, Yahoo, and Ask support a limited form of "wildcards" for path values. These are:
*
designates 0 or more instances of any valid character.
$
designates the end of the URL.
所以
Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
将禁止所有以 /collections/
开头后跟任何包含
+
%2B
%2b
因为这些字符在路径模式中没有特殊含义。