无法在 .net 中下载网页
Can't download web page in .net
我做了一个批处理来解析 gearbest.com 的 html 页以提取项目的数据(示例 link link)。
直到 2-3 周前网站更新后,它才有效。
所以我无法下载要解析的页面,我也不知道为什么。
在更新之前,我确实使用 HtmlAgilityPack 请求了以下代码。
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = null;
doc = web.Load(url); //now this the point where is throw the exception
我试过没有框架,我在请求中添加了一些日期
HttpWebRequest request = (HttpWebRequest) WebRequest.Create("https://it.gearbest.com/tv-box/pp_009940949913.html");
request.Credentials = CredentialCache.DefaultCredentials;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
request.ContentType = "text/html; charset=UTF-8";
request.CookieContainer = new CookieContainer();
request.Headers.Add("accept-language", "it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7");
request.Headers.Add("accept-encoding", "gzip, deflate, br");
request.Headers.Add("upgrade-insecure-requests", "1");
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8";
request.CookieContainer = new CookieContainer();
Response response = request.GetResponse(); //exception
例外情况是:
- IOException: 无法从传输连接读取数据
- SocketException: 无法建立连接。
如果我尝试请求主页 (https://it.gearbest.com),它会成功。
您认为问题是什么?
可能值得一试...
HttpRequest.KeepAlive = false;
HttpRequest.ProtocolVersion = HttpVersion.Version10;
出于某种原因,它不喜欢提供的用户代理。如果省略设置 UserAgent
一切正常
HttpWebRequest request = (HttpWebRequest) WebRequest.Create("https://it.gearbest.com/tv-box/pp_009940949913.html");
request.Credentials = CredentialCache.DefaultCredentials;
//request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
request.ContentType = "text/html; charset=UTF-8";
另一个解决方案是将 request.Connection
设置为随机字符串(但不是 keep-alive
或 close
)
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
request.Connection = "random value";
它也有效,但我无法解释为什么。
我做了一个批处理来解析 gearbest.com 的 html 页以提取项目的数据(示例 link link)。 直到 2-3 周前网站更新后,它才有效。 所以我无法下载要解析的页面,我也不知道为什么。 在更新之前,我确实使用 HtmlAgilityPack 请求了以下代码。
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = null;
doc = web.Load(url); //now this the point where is throw the exception
我试过没有框架,我在请求中添加了一些日期
HttpWebRequest request = (HttpWebRequest) WebRequest.Create("https://it.gearbest.com/tv-box/pp_009940949913.html");
request.Credentials = CredentialCache.DefaultCredentials;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
request.ContentType = "text/html; charset=UTF-8";
request.CookieContainer = new CookieContainer();
request.Headers.Add("accept-language", "it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7");
request.Headers.Add("accept-encoding", "gzip, deflate, br");
request.Headers.Add("upgrade-insecure-requests", "1");
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8";
request.CookieContainer = new CookieContainer();
Response response = request.GetResponse(); //exception
例外情况是:
- IOException: 无法从传输连接读取数据
- SocketException: 无法建立连接。
如果我尝试请求主页 (https://it.gearbest.com),它会成功。
您认为问题是什么?
可能值得一试...
HttpRequest.KeepAlive = false;
HttpRequest.ProtocolVersion = HttpVersion.Version10;
出于某种原因,它不喜欢提供的用户代理。如果省略设置 UserAgent
一切正常
HttpWebRequest request = (HttpWebRequest) WebRequest.Create("https://it.gearbest.com/tv-box/pp_009940949913.html");
request.Credentials = CredentialCache.DefaultCredentials;
//request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
request.ContentType = "text/html; charset=UTF-8";
另一个解决方案是将 request.Connection
设置为随机字符串(但不是 keep-alive
或 close
)
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
request.Connection = "random value";
它也有效,但我无法解释为什么。