使用 HtmlAgilityPack 抓取 url 内容会产生错误
Grabing url contents using HtmlAgilityPack generates error
我正在使用 HtmlAgilityPack
从 url 中抓取文本,这对大多数网站都运行良好,但对于某些网站,它今天开始出现 return 错误。
错误在行代码 doc = webGet.Load(url);
之后
错误信息:The underlying connection was closed: An unexpected error occurred on a send.
不确定为什么我会收到此错误,因为它以前使用此网站 url
示例 url : link
我尝试了 https url,例如 bbc.com,它可以正常工作。如果他们的代码有问题,任何指针
HtmlDocument doc = new HtmlDocument();
var url = txtGrabNewsURL.Text.Trim();
var webGet = new HtmlWeb();
doc = webGet.Load(url);
var baseUrl = new Uri(url);
// doc.LoadHtml(response);
String title = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "title"
select x.InnerText).FirstOrDefault();
String desc = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "meta"
&& x.Attributes["name"] != null
&& x.Attributes["name"].Value.ToLower() == "description"
select x.Attributes["content"].Value).FirstOrDefault();
String ogImage = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "meta"
&& x.Attributes["property"] != null
&& x.Attributes["property"].Value.ToLower() == "og:image"
select x.Attributes["content"].Value).FirstOrDefault();
List<String> imgs = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "img"
&& x.Attributes["src"] != null
select x.Attributes["src"].Value).ToList<String>();
List<String> imgList = (from x in doc.DocumentNode.Descendants("img")
where x.Attributes["src"] != null
select x.Attributes["src"].Value.ToLower()).ToList<String>();
完整的错误详细信息
System.Net.WebException was caught
HResult=-2146233079
Message=The underlying connection was closed: An unexpected error occurred on a send.
Source=System
StackTrace:
at System.Net.HttpWebRequest.GetResponse()
at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in D:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1355
at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in D:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1479
at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in D:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1106
at HtmlAgilityPack.HtmlWeb.Load(String url) in D:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1061
at _admin_News.btnGrabNews_Click(Object sender, EventArgs e) in c:\path\News.aspx.cs:line 361
InnerException: System.IO.IOException
HResult=-2146232800
Message=Authentication failed because the remote party has closed the transport stream.
Source=System
StackTrace:
at System.Net.Security.SslState.StartReadFrame(Byte[] buffer, Int32 readBytes, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.CheckCompletionBeforeNextReceive(ProtocolToken message, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ForceAuthentication(Boolean receiveFirst, Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ProcessAuthentication(LazyAsyncResult lazyResult)
at System.Net.TlsStream.CallProcessAuthentication(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Net.TlsStream.ProcessAuthentication(LazyAsyncResult result)
at System.Net.TlsStream.Write(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.PooledStream.Write(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.ConnectStream.WriteHeaders(Boolean async)
InnerException:
我是 运行 我本地机器上的代码它工作正常并且得到了没有任何错误的输出。
我还以为那个时候网站打不开了,是连接问题。
HtmlDocument doc = new HtmlDocument();
var url = "https://m.gulfnews.com/business/sectors/banking/rebuilding-lives-10-years-after-lehman-s-fall-1.2277318"
var webGet = new HtmlWeb();
doc = webGet.Load(url);
String title = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "title"
select x.InnerText).FirstOrDefault();
输出:重建生活,10。 . . . .等等
如果它只发生在 HTTPS 资源上,您的目标是 .Net 4,那么它可能与默认的 SSL/TLS 支持有关。请尝试以下操作:
using System.Net;
static void Main()
{
//place this anywhere in your code prior to invoking the Web request
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12 | SecurityProtocolType.Ssl3;
}
我正在使用 HtmlAgilityPack
从 url 中抓取文本,这对大多数网站都运行良好,但对于某些网站,它今天开始出现 return 错误。
错误在行代码 doc = webGet.Load(url);
之后
错误信息:The underlying connection was closed: An unexpected error occurred on a send.
不确定为什么我会收到此错误,因为它以前使用此网站 url 示例 url : link
我尝试了 https url,例如 bbc.com,它可以正常工作。如果他们的代码有问题,任何指针
HtmlDocument doc = new HtmlDocument();
var url = txtGrabNewsURL.Text.Trim();
var webGet = new HtmlWeb();
doc = webGet.Load(url);
var baseUrl = new Uri(url);
// doc.LoadHtml(response);
String title = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "title"
select x.InnerText).FirstOrDefault();
String desc = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "meta"
&& x.Attributes["name"] != null
&& x.Attributes["name"].Value.ToLower() == "description"
select x.Attributes["content"].Value).FirstOrDefault();
String ogImage = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "meta"
&& x.Attributes["property"] != null
&& x.Attributes["property"].Value.ToLower() == "og:image"
select x.Attributes["content"].Value).FirstOrDefault();
List<String> imgs = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "img"
&& x.Attributes["src"] != null
select x.Attributes["src"].Value).ToList<String>();
List<String> imgList = (from x in doc.DocumentNode.Descendants("img")
where x.Attributes["src"] != null
select x.Attributes["src"].Value.ToLower()).ToList<String>();
完整的错误详细信息
System.Net.WebException was caught
HResult=-2146233079
Message=The underlying connection was closed: An unexpected error occurred on a send.
Source=System
StackTrace:
at System.Net.HttpWebRequest.GetResponse()
at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in D:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1355
at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in D:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1479
at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in D:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1106
at HtmlAgilityPack.HtmlWeb.Load(String url) in D:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1061
at _admin_News.btnGrabNews_Click(Object sender, EventArgs e) in c:\path\News.aspx.cs:line 361
InnerException: System.IO.IOException
HResult=-2146232800
Message=Authentication failed because the remote party has closed the transport stream.
Source=System
StackTrace:
at System.Net.Security.SslState.StartReadFrame(Byte[] buffer, Int32 readBytes, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartReceiveBlob(Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.CheckCompletionBeforeNextReceive(ProtocolToken message, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.StartSendBlob(Byte[] incoming, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ForceAuthentication(Boolean receiveFirst, Byte[] buffer, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslState.ProcessAuthentication(LazyAsyncResult lazyResult)
at System.Net.TlsStream.CallProcessAuthentication(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Net.TlsStream.ProcessAuthentication(LazyAsyncResult result)
at System.Net.TlsStream.Write(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.PooledStream.Write(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.ConnectStream.WriteHeaders(Boolean async)
InnerException:
我是 运行 我本地机器上的代码它工作正常并且得到了没有任何错误的输出。 我还以为那个时候网站打不开了,是连接问题。
HtmlDocument doc = new HtmlDocument();
var url = "https://m.gulfnews.com/business/sectors/banking/rebuilding-lives-10-years-after-lehman-s-fall-1.2277318"
var webGet = new HtmlWeb();
doc = webGet.Load(url);
String title = (from x in doc.DocumentNode.Descendants()
where x.Name.ToLower() == "title"
select x.InnerText).FirstOrDefault();
输出:重建生活,10。 . . . .等等
如果它只发生在 HTTPS 资源上,您的目标是 .Net 4,那么它可能与默认的 SSL/TLS 支持有关。请尝试以下操作:
using System.Net;
static void Main()
{
//place this anywhere in your code prior to invoking the Web request
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls | SecurityProtocolType.Tls11 | SecurityProtocolType.Tls12 | SecurityProtocolType.Ssl3;
}