/*.php$ 在 robots.txt 中是什么意思？

Question

我发现一个网站在其 robots.txt 文件中使用以下内容：

User-agent: *
Disallow: /*.php$

那么它有什么作用呢？它会阻止网络爬虫抓取以下网址吗？

https://example.com/index.php
https://example.com/index.php?page=Events&action=Upcoming

它也会阻止子域吗？

https://subdomain.example.com/index.php

Answer 1

So what does it do?

根据规范，它意味着 "URLs starting with /*.php$"，这不是很有用。可能有支持它的一些自定义语法的引擎。我知道一些支持通配符的东西，但这看起来像正则表达式语法，我在 robots.txt.

中没有听说过任何支持通配符的东西

Will it prevent web crawlers from crawling the following URLs?

按规格：否

如果有任何东西支持正则表达式，那么它将阻止第一个而不是第二个。

Will it block subdomains too?

没有。当涉及到 robots.txt 时，每个起源都是独立的。子域站点需要自己的资源副本。

Answer 2

看起来像正则表达式其实是正则表达式are not in the spec. But Google and Bing both honours wildcards (*) and end-of-url markers ($). You can try your robots.txt rules here.

What does /*.php$ mean in robots.txt?