.htaccess 密码保护的网站是否对搜索引擎隐藏？

Question

我们在被 .htaccess 密码阻止的域上有一个网站实例。允许通过一些IP，比如公司的网络。

没有入站链接（虽然显然不能保证这个100%）
该站点没有robots.txt
robots 元标记设置为跟随和索引

在所有这些条件下，搜索引擎是否仍然可以索引该站点？我不这么认为，但想确保没有我不知道的漏洞。

Answer 1

Pages that are password-protected will not be accessible to the search engines.

Search engine robots typically can’t log in to crawl pages, so content behind a login will not make it into the search index.

_{source: http://www.yourseoplan.com/is-password-protected-content-indexable-by-search-engines/}

另见 post 来自 Google employee:

No, our crawlers can't access login protected pages.

_{source: Gary Illyes, Google, https://productforums.google.com/forum/#!topic/news/2SdcGEWht1o}

Answer 2

我很确定任何爬虫在到达任何内容之前都会停止，此时 .htaccess 需要密码，因为这就是拥有 .htaccess 密码的全部意义。

如果出于教育目的，您想要冗余确定，您可以在私人选项卡中从各种浏览器进行测试，并且可以在套接字上发送原始请求以查看返回的输出。这是一个描述如何发送原始 HTTP 请求的页面：https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/HTTP_Basics.html

这是该页面的摘录，其中描述了您将如何在 http://nowhere123.com/docs/index.html 获取页面：

GET /docs/index.html HTTP/1.1
Host: www.nowhere123.com
Accept: image/gif, image/jpeg, */*
Accept-Language: en-us
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
(blank line)

您可以使用 telnet 发送原始请求，这在大多数 linux 发行版中绝对可用，并且可能在 windows 中也可用。

我继续使用已知的 .htaccess 密码网关向我自己的一台服务器发出了这个请求（修改了路径和主机），并得到了这个响应：

HTTP/1.0 401 Unauthorized
Date: Fri, 24 Jun 2016 15:08:26 GMT
WWW-Authenticate: Basic realm="Restricted Area"
Content-Type: text/plain
Content-Length: 19

Invalid CredentialsConnection closed by foreign host.

所以...也许这会对您有所帮助。

.htaccess 密码保护的网站是否对搜索引擎隐藏？

Is .htaccess password protected site hidden from search engines?

.htaccess

seo

robots.txt