尝试使用 htmlagiltypack 加载 html 时出错

Error when try to load html with htmlagiltypack

我正在尝试运行这个代码

string path = "http://warisons.rssing.com/chan1729325/all_p43.html";
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(path);
var div = htmlDoc.DocumentNode.Descendants("div");
foreach (var x in div)
{
    Console.WriteLine(x.Attributes["class"].Value);
}

当我在 htmlDoc.LoadHtml(path); 中调试此代码时出现此错误

Locating source for 'd:\SVN_CHECKOUT\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs'. Checksum: MD5 {4e 14 d3 b d5 30 6e 2c bf 84 ab 8a 96 82 4a 8f} The file 'd:\SVN_CHECKOUT\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs' does not exist. Looking in script documents for 'd:\SVN_CHECKOUT\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs'... Looking in the projects for 'd:\SVN_CHECKOUT\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs'. The file was not found in a project. Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\'... Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\vccorlib\'... Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\atlmfc\src\mfc\'... Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\atlmfc\src\atl\'... Looking in directory 'C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\atlmfc\include'... The debug source files settings for the active solution indicate that the debugger will not ask the user to find the file: d:\SVN_CHECKOUT\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs. The debugger could not locate the source file 'd:\SVN_CHECKOUT\htmlagilitypack\Trunk\HtmlAgilityPack\HtmlDocument.cs'.

您从 URI 加载 html 文档的尝试不正确。

Methof HtmlDocument.LoadHtml 从提供的字符串加载 html,因此它的参数是 html 文本本身,而不是 URI。

要从提供的 URI 加载 html,您需要如下内容:

string path = "http://warisons.rssing.com/chan1729325/all_p43.html";
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlWeb().Load(path);

另请注意,您可以在此处获得 NullReferenceException

x.Attributes["class"].Value

因为您在访问它的值之前没有检查是否有 class 属性 (x.Attributes["class"] != null)。