使用 htmlagilitypack 解析单个项目中的 HTML class

Question

我想解析 HTML，我使用了以下代码，但我在一个项目中获取了所有内容，而不是单独获取项目

var url = "https://subscene.com/subtitles/searchbytitle?query=joker&l=";
var web = new HtmlWeb();
var doc = web.Load(url);
IEnumerable<HtmlNode> nodes =
   doc.DocumentNode.Descendants()
     .Where(n => n.HasClass("search-result"));

foreach (var item in nodes)
{
    string itemx = item.SelectSingleNode(".//a").Attributes["href"].Value;

    MessageBox.Show(itemx);
    MessageBox.Show(item.InnerText);

}

我只收到第一个项目的一条消息，第二个消息显示所有项目

Answer 1

我认为这是您查找和存储数据的方式。尝试：

    foreach (HtmlNode link doc.DocumentNode.SelectNodes("//a[@href]"))
    {
        string hrefValue = link.GetAttributeValue( "href", string.Empty );            
        MessageBox.Show(hrefValue);
        MessageBox.Show(link.InnerText);
    }

Answer 2

根据class'search-result'从url中查找数据时，只返回一个节点。您无需遍历其子项，而只需遍历那个 div，这就是为什么您只得到一个结果的原因。

如果你想得到div里面的所有链接的列表和class"search-result"，那么你可以这样做。

代码：

    string url = "https://subscene.com/subtitles/searchbytitle?query=joker&l=";
    HtmlWeb web = new HtmlWeb();
    HtmlDocument doc = web.Load(url);

    List<string> listOfUrls = new List<string>();
    HtmlNode searchResult = doc.DocumentNode.SelectSingleNode("//div[@class='search-result']");

    // Iterate through all the child nodes that have the 'a' tag.
    foreach (HtmlNode node in searchResult.SelectNodes(".//a"))
    {
        string thisUrl = node.GetAttributeValue("href", "");
        if (!string.IsNullOrEmpty(thisUrl) && !listOfUrls.Contains(thisUrl))
            listOfUrls.Add(thisUrl);
    }

它有什么作用？

SelectSingleNode("//div[@class='search-result']") -> 检索包含所有搜索结果的 div 并忽略文档的其余部分。
遍历所有 "subnodes" 中只有 href 并将其添加到列表中。子节点是根据点符号 SelectNodes(".//a") 确定的（而不是 .//，如果您使用 //，它将搜索整个页面，这不是您想要的）。
If 语句确保它只添加唯一的非空值。

您现在拥有所有链接。

Fiddle: https://dotnetfiddle.net/j5aQFp

使用 htmlagilitypack 解析单个项目中的 HTML class

Parse HTML class in individual items with htmlagilitypack

c#

html-agility-pack