从 URL 加载 html 的两种方法?
Two method for loading html from URL?
为了从 URL 加载 HTML,我使用下面的方法
public HtmlDocument DownloadSource(string url)
{
try
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(DownloadString(url));
return doc;
}
catch (Exception e)
{
if (Task.Error == null)
Task.Error = e;
Task.Status = TaskStatuses.Error;
Done = true;
return null;
}
}
但是今天上面的代码突然停止工作了。我发现了另一种方法并且它工作正常。
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(url.ToString());
现在我只想知道这两种方法的区别
现在看来 User-Agent
header 对 your site 是强制性的。
HtmlAgilityPack
一切正常,但您应该更改 DownloadString(url)
方法。如果您使用 Fiddler 检查请求,您将看到它 returns 403 Forbidden
:
解决方案是在请求中添加任何 User-Agent
header:
using HtmlAgilityPack;
using System;
using System.Net;
class Program
{
static void Main()
{
var doc = DownloadSource("https://videohive.net/item/inspired-slideshow/21544630");
Console.ReadKey();
}
public static HtmlDocument DownloadSource(string url)
{
try
{
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(DownloadString(url));
return doc;
}
catch (Exception e)
{
// exception handling here
}
return null;
}
static String DownloadString(String url)
{
WebClient client = new WebClient();
client.Headers.Add("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:x.x.x) Gecko/20041107 Firefox/x.x");
return client.DownloadString(url);
}
}
为了从 URL 加载 HTML,我使用下面的方法
public HtmlDocument DownloadSource(string url)
{
try
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(DownloadString(url));
return doc;
}
catch (Exception e)
{
if (Task.Error == null)
Task.Error = e;
Task.Status = TaskStatuses.Error;
Done = true;
return null;
}
}
但是今天上面的代码突然停止工作了。我发现了另一种方法并且它工作正常。
HtmlWeb web = new HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load(url.ToString());
现在我只想知道这两种方法的区别
现在看来 User-Agent
header 对 your site 是强制性的。
HtmlAgilityPack
一切正常,但您应该更改 DownloadString(url)
方法。如果您使用 Fiddler 检查请求,您将看到它 returns 403 Forbidden
:
解决方案是在请求中添加任何 User-Agent
header:
using HtmlAgilityPack;
using System;
using System.Net;
class Program
{
static void Main()
{
var doc = DownloadSource("https://videohive.net/item/inspired-slideshow/21544630");
Console.ReadKey();
}
public static HtmlDocument DownloadSource(string url)
{
try
{
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(DownloadString(url));
return doc;
}
catch (Exception e)
{
// exception handling here
}
return null;
}
static String DownloadString(String url)
{
WebClient client = new WebClient();
client.Headers.Add("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:x.x.x) Gecko/20041107 Firefox/x.x");
return client.DownloadString(url);
}
}