使用 WebRequest 尝试了太多重定向
Too many redirections were attempted using WebRequest
当试图抓取网页的 html 时,偶尔会出现异常 "Too many redirections were attempted"。
此类网站的一个示例是 http://www.magicshineuk.co.uk/
通常我会将超时设置为大约 6 秒...但即使有 30 秒,并且允许最大重定向到 200 之类的疯狂值,它仍然会抛出 "Too many redirections" 异常,或者,将发生超时。
我怎样才能解决这个问题?
我的代码如下...
try
{
System.Net.WebRequest request = System.Net.WebRequest.Create("http://www.magicshineuk.co.uk/");
var hwr = ((HttpWebRequest)request);
hwr.UserAgent ="Mozilla/5.0 (Windows NT 10.0; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0";
hwr.Headers.Add("Accept-Language", "en-US,en;q=0.5");
hwr.Headers.Add("Accept-Encoding", "gzip, deflate");
hwr.ContentType = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"; ;
hwr.KeepAlive = true;
hwr.Timeout = 30000; // 30 seconds... normally set to 6000
hwr.Method = "GET";
hwr.AllowAutoRedirect = true;
hwr.CookieContainer = new System.Net.CookieContainer();
// Setting this Makes no difference... normally I would like to keep to a sensible maximum but I will leave as the default of 50 if needs be...
// Either way, the Too Many Redirections exception occurs
hwr.MaximumAutomaticRedirections = 200;
using (var response = (HttpWebResponse)hwr.GetResponse())
{
Console.WriteLine(String.Format("{0} {1}", (int)response.StatusCode, response.StatusCode));
Console.WriteLine(response.ResponseUri);
Console.WriteLine("Last modified: {0}", response.LastModified);
Console.WriteLine("Server: {0}", response.Server);
Console.WriteLine("Supports Headers: {0}", response.SupportsHeaders);
Console.WriteLine("Headers: ");
// do something... e.g:
int keyCount = response.Headers.Keys.Count;
int i = 0;
Dictionary<string, string> hc = new Dictionary<string, string>();
foreach (var hname in response.Headers)
{
var hv = response.Headers[i].ToString();
hc.Add(hname.ToString(), hv);
i++;
}
foreach (var di in hc)
{
Console.WriteLine(" {0} = {1}", di.Key, di.Value);
}
}
}
catch (Exception ex)
{
Console.WriteLine("Exception: ");
Console.WriteLine(ex.Message);
}
我试过你的代码,我需要注释掉它 // hwr.Host = Utils.GetSimpleUrl(url);
并且它运行良好。如果您频繁轮询,则目标站点或介于两者之间的某些东西(代理、防火墙等)可能会将您的轮询识别为拒绝服务并使您超时一段时间。或者,如果您在公司防火墙后面,您可能会从内部网络设备接收到类似的信息。
你多久 运行 这个抓取工具?
编辑添加:
我用 .net 4.52 试过,Windows 7 x64,Visual Studio 2015
目标站点也可能不可靠(上下)
- 您和目标站点之间可能存在间歇性网络问题
- 他们可能会公开一个 API,这将是一个更可靠的集成
当试图抓取网页的 html 时,偶尔会出现异常 "Too many redirections were attempted"。
此类网站的一个示例是 http://www.magicshineuk.co.uk/
通常我会将超时设置为大约 6 秒...但即使有 30 秒,并且允许最大重定向到 200 之类的疯狂值,它仍然会抛出 "Too many redirections" 异常,或者,将发生超时。
我怎样才能解决这个问题?
我的代码如下...
try
{
System.Net.WebRequest request = System.Net.WebRequest.Create("http://www.magicshineuk.co.uk/");
var hwr = ((HttpWebRequest)request);
hwr.UserAgent ="Mozilla/5.0 (Windows NT 10.0; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0";
hwr.Headers.Add("Accept-Language", "en-US,en;q=0.5");
hwr.Headers.Add("Accept-Encoding", "gzip, deflate");
hwr.ContentType = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"; ;
hwr.KeepAlive = true;
hwr.Timeout = 30000; // 30 seconds... normally set to 6000
hwr.Method = "GET";
hwr.AllowAutoRedirect = true;
hwr.CookieContainer = new System.Net.CookieContainer();
// Setting this Makes no difference... normally I would like to keep to a sensible maximum but I will leave as the default of 50 if needs be...
// Either way, the Too Many Redirections exception occurs
hwr.MaximumAutomaticRedirections = 200;
using (var response = (HttpWebResponse)hwr.GetResponse())
{
Console.WriteLine(String.Format("{0} {1}", (int)response.StatusCode, response.StatusCode));
Console.WriteLine(response.ResponseUri);
Console.WriteLine("Last modified: {0}", response.LastModified);
Console.WriteLine("Server: {0}", response.Server);
Console.WriteLine("Supports Headers: {0}", response.SupportsHeaders);
Console.WriteLine("Headers: ");
// do something... e.g:
int keyCount = response.Headers.Keys.Count;
int i = 0;
Dictionary<string, string> hc = new Dictionary<string, string>();
foreach (var hname in response.Headers)
{
var hv = response.Headers[i].ToString();
hc.Add(hname.ToString(), hv);
i++;
}
foreach (var di in hc)
{
Console.WriteLine(" {0} = {1}", di.Key, di.Value);
}
}
}
catch (Exception ex)
{
Console.WriteLine("Exception: ");
Console.WriteLine(ex.Message);
}
我试过你的代码,我需要注释掉它 // hwr.Host = Utils.GetSimpleUrl(url);
并且它运行良好。如果您频繁轮询,则目标站点或介于两者之间的某些东西(代理、防火墙等)可能会将您的轮询识别为拒绝服务并使您超时一段时间。或者,如果您在公司防火墙后面,您可能会从内部网络设备接收到类似的信息。
你多久 运行 这个抓取工具?
编辑添加:
我用 .net 4.52 试过,Windows 7 x64,Visual Studio 2015
目标站点也可能不可靠(上下)
- 您和目标站点之间可能存在间歇性网络问题
- 他们可能会公开一个 API,这将是一个更可靠的集成