使用 HtmlAgilityPack 为没有 class 的标签抓取数据

Question

这是我的 C# 代码，我想做的是使用 HtmlAgilityPack 从网站上抓取数据，但每次都没有显示任何内容，不知道我做错了什么，有点困惑

HtmlAgilityPack.HtmlWeb webb = new HtmlAgilityPack.HtmlWeb();
        ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;

        HtmlAgilityPack.HtmlDocument doc = webb.Load("mywebsite");


        HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//ul[@class='unstyled']//li//a");

       if (nodes != null)
       {
           foreach (HtmlNode n in nodes)
           {
               q = n.InnerText;
               q = System.Net.WebUtility.HtmlDecode(q);
               q = q.Trim();
               Console.WriteLine(q);
           }

       }
       else
       {
           Console.WriteLine("nothing found");
       }

Here is the picture of the tag 我正在尝试从中捕获数据我需要来自 <a> 标签的数据。

Answer 1

用于 select 标记的 XPath 不正确。

HtmlNodeCollection nodes = 
doc.DocumentNode.SelectNodes("//ul[@class='unstyled']/li/a");

这应该 select 所有锚节点，然后您可以遍历节点以获取 InnerHtml。

工作示例如下所示

string s = "<ul class='unstyle no-overflow'><li><ul class='unstyled'><li><a href='http://www.smsconnexion.com'>SMS ConneXion</a></li></ul><ul class='unstyled'><li><a href='http://www.celusion.com'>Celusion</a></li></ul></li></ul>";


HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(s);

HtmlNodeCollection nodes = 
doc.DocumentNode.SelectNodes("//ul[@class='unstyled']/li/a");

foreach(var node in nodes)
{
    Console.WriteLine(node.Attributes["href"].Value);
}

Console.ReadLine();

使用 HtmlAgilityPack 为没有 class 的标签抓取数据

Scrape data with HtmlAgilityPack for a tag which doesn't have class

c#

html-agility-pack