为什么 RCurl::url.exist 无法测试具有永久重定向的服务器?

Why RCurl::url.exist cannot test servers with permanent redirection?

我使用 RCurl::urls.exist 函数在建立连接之前测试远程主机。它适用于我的大多数主机,但我遇到 https://eidoo.io/ 问题,导致函数无限循环。

该函数使用 RCurl::curlPerform 来确定对特定 URL 的请求是否正确响应,方法是要求服务器不要 return body。它只处理 header.

library(RCurl)
url <- "https://eidoo.io/"
url.exist(url) # This will crash your RStudio
curlPerform(url = url, followlocation = TRUE, nobody = TRUE) # This will crash your RStudio as well

如何在与该网站建立连接之前对其进行测试?

这是对 curlPerform()https://eidoo.io/ 进行 运行 时详细输出的摘录。不幸的是,日志的开头丢失了,但似乎有一个由 HTTP/1.1 301 Moved Permanently.

指示的永久重定向
< HTTP/1.1 301 Moved Permanently
< Date: Mon, 06 Nov 2017 10:53:36 GMT
< Content-Type: text/html
< Connection: keep-alive
< Set-Cookie: __cfduid=d41359f0407b83fef208fca7ea017c5d61509965616; expires=Tue, 06-Nov-18 10:53:36 GMT; path=/; domain=.eidoo.io; HttpOnly; Secure
< Location: https://eidoo.io/404
< X-Frame-Options: SAMEORIGIN
< Allow: GET, POST
< Strict-Transport-Security: max-age=0
< Server: cloudflare-nginx
< CF-RAY: 3b97828fcc8f090e-CDG
< 
* Connection #7 to host eidoo.io left intact
* Issue another request to this URL: 'https://eidoo.io/404'
* Found bundle for host eidoo.io: 0x7dde110 [can pipeline]
* Re-using existing connection! (#7) with host eidoo.io
* Connected to eidoo.io (104.25.57.118) port 443 (#7)
> HEAD /404 HTTP/1.1
Host: eidoo.io
Accept: */*

< HTTP/1.1 301 Moved Permanently
< Date: Mon, 06 Nov 2017 10:53:36 GMT
< Content-Type: text/html
< Connection: keep-alive
< Set-Cookie: __cfduid=d41359f0407b83fef208fca7ea017c5d61509965616; expires=Tue, 06-Nov-18 10:53:36 GMT; path=/; domain=.eidoo.io; HttpOnly; Secure
< Location: https://eidoo.io/404
< X-Frame-Options: SAMEORIGIN
< Allow: GET, POST
< Strict-Transport-Security: max-age=0
< Server: cloudflare-nginx
< CF-RAY: 3b97828ffca5090e-CDG
< 
* Connection #7 to host eidoo.io left intact
* Issue another request to this URL: 'https://eidoo.io/404'
* Found bundle for host eidoo.io: 0x7dde110 [can pipeline]
* Re-using existing connection! (#7) with host eidoo.io
* Connected to eidoo.io (104.25.57.118) port 443 (#7)
> HEAD /404 HTTP/1.1
Host: eidoo.io
Accept: */*

< HTTP/1.1 301 Moved Permanently
< Date: Mon, 06 Nov 2017 10:53:36 GMT
< Content-Type: text/html
< Connection: keep-alive
< Set-Cookie: __cfduid=d41359f0407b83fef208fca7ea017c5d61509965616; expires=Tue, 06-Nov-18 10:53:36 GMT; path=/; domain=.eidoo.io; HttpOnly; Secure
< Location: https://eidoo.io/404
< X-Frame-Options: SAMEORIGIN
< Allow: GET, POST
< Strict-Transport-Security: max-age=0
< Server: cloudflare-nginx
< CF-RAY: 3b9782902cb7090e-CDG
< 
* Connection #7 to host eidoo.io left intact
* Issue another request to this URL: 'https://eidoo.io/404'
* Found bundle for host eidoo.io: 0x7dde110 [can pipeline]
* Re-using existing connection! (#7) with host eidoo.io
* Connected to eidoo.io (104.25.57.118) port 443 (#7)
> HEAD /404 HTTP/1.1
Host: eidoo.io
Accept: */*

相反,当我 运行 与 www.google.com 相同的命令时,一切正常:

> curlPerform(url = "www.google.com", followlocation = TRUE, nobody = TRUE, verbose = TRUE)
* Rebuilt URL to: www.google.com/
*   Trying 216.58.212.164...
* Connected to www.google.com (216.58.212.164) port 80 (#0)
> HEAD / HTTP/1.1
Host: www.google.com
Accept: */*

< HTTP/1.1 302 Found
< Cache-Control: private
< Content-Type: text/html; charset=UTF-8
< Referrer-Policy: no-referrer
< Location: http://www.google.fr/?gfe_rd=cr&dcr=0&ei=eEIAWrjmJuzG8AfrjYLgBA
< Content-Length: 268
< Date: Mon, 06 Nov 2017 11:07:36 GMT
< 
* Connection #0 to host www.google.com left intact
* Issue another request to this URL: 'http://www.google.fr/?gfe_rd=cr&dcr=0&ei=eEIAWrjmJuzG8AfrjYLgBA'
*   Trying 216.58.212.163...
* Connected to www.google.fr (216.58.212.163) port 80 (#1)
> HEAD /?gfe_rd=cr&dcr=0&ei=eEIAWrjmJuzG8AfrjYLgBA HTTP/1.1
Host: www.google.fr
Accept: */*

< HTTP/1.1 200 OK
< Date: Mon, 06 Nov 2017 11:07:36 GMT
< Expires: -1
< Cache-Control: private, max-age=0
< Content-Type: text/html; charset=ISO-8859-1
< P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
< Server: gws
< X-XSS-Protection: 1; mode=block
< X-Frame-Options: SAMEORIGIN
< Set-Cookie: 1P_JAR=2017-11-06-11; expires=Mon, 13-Nov-2017 11:07:36 GMT; path=/; domain=.google.fr
< Set-Cookie: NID=116=2OlLs4BCZcDE1a3y6m-ZWn2Kvp0_rWGxH5XQTOw_pwZOeNn1QisFEpXkrLvxYdKAp2MX0Ff4G0ELoymvR2xVeYM0EjPeVi9LwIqX0x4LTHkPfKHaPt0itOcDXD18_vaG; expires=Tue, 08-May-2018 11:07:36 GMT; path=/; domain=.google.fr; HttpOnly
< Transfer-Encoding: chunked
< Accept-Ranges: none
< Vary: Accept-Encoding
< 
* Connection #1 to host www.google.fr left intact
OK 
 0
>

我用解决方法回答了我自己的问题。问题与 HTTP 重定向有关,并且说 Curl to now follow the relative or the absolute URL 工作正常。

curlPerform(url = url, followlocation = FALSE, nobody = TRUE)