机器人 txt 文件语法我可以禁止所有然后只允许某些站点

Question

您能否禁止所有站点，然后仅允许特定站点。我知道一种方法是禁止特定站点并允许所有站点。反过来才有效：E.G:

User-agent: *
Disallow: /
Allow: /siteOne/
Allow: /siteTwo/
Allow: /siteThree/

简单地禁止所有网站然后允许网站似乎比对所有网站都安全得多，他们必须考虑所有您不希望他们抓取的地方。

上面的这种方法是否会导致网站描述说“由于该网站的 robots.txt，此结果的描述不可用 – 了解更多。”在 Google 主页的有机排名中

更新 - 我进入了 Google 网站管理员工具 > 抓取 > robots.txt 测试器。起初，当我输入 siteTwo/default.asp 时，它显示 Blocked 并突出显示了 'Disallow: /' 行。离开并重新访问该工具后，它现在显示已允许。很奇怪。因此，如果这表示允许，我想知道为什么它会在网站描述中给出上面的消息？

UPDATE2 - 上面的 robots.txt 文件示例应该是 dirOne、dirTwo 而不是 siteOne、siteTwo。了解 robot.txt 的两个很好的链接是下面接受的答案中 unor 的 robot.txt 规范，robots exclusion standard 也是必读的。这两页都解释了这一点。总之，是的，你可以禁止，他们允许，但总是把禁止放在最后。

Answer 1

(注意：你不是disallow/allow抓取robots.txt中的"sites"，而是抓取URL。Disallow/Allow的值为always the beginning of a URL path.)

robots.txt specification没有定义Allow。
遵循此规范的消费者将简单地忽略任何 Allow 字段。一些消费者，如 Google，扩展规范并理解 Allow。

对于那些不知道的消费者Allow：一切都是不允许的。
对于了解 Allow 的消费者：是的，您的 robots.txt 应该适合他们。一切都不允许，除了那些与 Allow 字段匹配的 URL。

假设您的 robots.txt 托管在 http://example.org/robots.txt，Google 将允许抓取以下网址：

http://example.org/siteOne/
http://example.org/siteOne/foo
http://example.org/siteOne/foo/
http://example.org/siteOne/foo.html

Google 不允许 抓取以下网址：

http://example.org/siteone/（区分大小写）
http://example.org/siteOne（缺少尾部斜线）
http://example.org/foo/siteOne/（不匹配路径的开头）

机器人 txt 文件语法我可以禁止所有然后只允许某些站点

robots txt file syntax can I dis allow all then only allow some sites

seo

robots.txt