从 google 机器人访问我的网络服务器

getting hits on my web server from google bots

我无法理解为什么我会在我的 nginx 服务器中看到这些日志。

66.249.79.115 - - [06/Oct/2015:18:50:17 +0000] "GET /profile/?Rohatgi.Nikhil HTTP/1.1" 404 1031 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12F70 Safari/600.1.4 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.79.115 - - [06/Oct/2015:18:50:49 +0000] "GET /profile/?Mukherjee.PankajKumar HTTP/1.1" 404 1038 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12F70 Safari/600.1.4 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.79.115 - - [06/Oct/2015:18:51:21 +0000] "GET /profile/?Khorana.Ashish HTTP/1.1" 404 1031 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12F70 Safari/600.1.4 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.79.117 - - [06/Oct/2015:18:51:52 +0000] "GET /profile/?Mittal.AshokKumar HTTP/1.1" 404 1034 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12F70 Safari/600.1.4 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.79.117 - - [06/Oct/2015:18:52:24 +0000] "GET /profile/?Suri.Divya HTTP/1.1" 404 1029 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12F70 Safari/600.1.4 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.79.117 - - [06/Oct/2015:18:52:56 +0000] "GET /profile/?gupta.member) HTTP/1.1" 404 1030 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 8_3 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12F70 Safari/600.1.4 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

我最近才上线我的网站。

从日志中我可以稍微了解到它们来自 Google 机器人,但我想了解为什么我会收到此信息以及如何阻止它们? 如果我阻止,我的网站不会受到伤害。

如果您的网站最近上线,这种行为是正常的。 Google 机器人正在抓取网络并将新网页编入索引,因此可以使用 google 搜索找到它们。从逻辑文件中可以看出,google bot 伪装成 iPhone (iPhone; CPU iPhone OS 8_3 喜欢 Mac OS X).

如果您想阻止 googlebot,您可以按照以下指南操作:Nginx + CDN + GoogleBot or how to avoid many useless Googlebot hits 请注意,如果您阻止 google 搜索,将无法找到您的网页 google 爬虫机器人。 如果您想阻止范围更广的 spider/crawling 僵尸程序,请参阅 post by user GD-hussle.

有关 google 爬虫的一般信息,请查看 Crawling, indexing & ranking