无法获取属于特定 ul 的 li

Cannot get li that belong to a specific ul

我有这样的结构:

<ul>
    <li class="list-group-item px-0">
        <h2>Foo</h2>
        <ul>
            <li class="list-group-item">
                <h3>Test</h3>
            </li>
        </ul>
    </li>
     <li class="list-group-item px-0">
        <h2>Contoso</h2>
        <ul>
            <li class="list-group-item">
                <h3>Test 2</h3>
            </li>
        </ul>
    </li>
</ul>

我正在尝试获取属于迭代中节点的所有 li,这是第一个 ul,因此结果应该 return:Foo 和 Contoso 但是我得到了所有可用的 li,这是我的代码:

var liCollection = node.SelectNodes(".//ul/li[@class='list-group-item']");

我可以通过添加 px-0 解决此问题,但是否有可能在迭代中仅获取与第一个 ul 关联的 li?

完整代码:

https://pastebin.com/wjE2q1n2

我做了一个样品来满足你的需要。我认为这就是您想要实现的目标!

var list = doc.DocumentNode.SelectNodes(
    "//div[@class='shadow-sm autoscroll my-1']"); 

var collection = list.Select(x => x.SelectNodes(".//ul/li[@class='list-group-item']"));

//This is for "A", "B" etc
var category = list.Select(x => x.SelectNodes(".//span[contains(@class, 'badge-light')]"));

//This is for "A01A" etc
var listTitles = list.Select(x => x.SelectNodes(".//ul/li[@class='list-group-item']//span"));

//This is for "Preparazioni stomatologiche" etc
var descriptions = list.Select(x => x.SelectNodes(".//ul/li[@class='list-group-item']//a"));

以此为指导,您可以抓取您真正想要的数据..

更新

合并在一起:

var doc = new HtmlDocument();
doc.Load(Directory.GetCurrentDirectory() + "/html.txt");

var data = doc.DocumentNode.SelectNodes("//div[@class='shadow-sm autoscroll my-1']");

List<dynamic> objects = new();
foreach (var item in data)
{
    foreach (var sub in item.SelectNodes(".//ul[contains(@class, 'list-group')]//li"))
    {
        var obj = new
        {
            Category = item.SelectSingleNode(".//div[@class='mb-1']//span").InnerText.Trim(),
            Description = item.SelectSingleNode(".//div[@class='mb-1']//h2").InnerText.Trim(),
            Sub = new
            {
                SubCategories = sub.SelectSingleNode(".//span").InnerText.Trim(),
                SubDescriptions = sub.SelectSingleNode(".//a").InnerText.Trim(),
            }            
        };
        objects.Add(obj); 
    }
}

var json = JsonSerializer.Serialize(objects, new JsonSerializerOptions { WriteIndented = true });

输出:https://i.imgur.com/zvNo3US.png

我做了一个完全不同的选择:

            html1 = File.ReadAllText("input.html");
            var htmlDoc = new HtmlDocument();
            htmlDoc.LoadHtml(html1);

            var i = 0;
            var uls = htmlDoc.DocumentNode.SelectNodes("//span[@class]/../../div[1]/*");
            foreach (HtmlNode ul in uls)
            {
                var group = ul.InnerText.Replace('\r',' ').Replace('\n',' ').Trim();
                foreach( HtmlNode subul in ul.SelectNodes("./../../div[2]/*"))
                {
                    var sub = subul.InnerText.Trim();
                    if (!string.IsNullOrEmpty(sub)) Console.WriteLine($"{group}: {sub}");
                }
            }

输出:

A: Apparato gastrointestinale e metabolismo
A01: Preparati stomatologici
A01A: Preparazioni stomatologiche
A02: Farmaci per malattie correlate all'acidosi
A02A: Antiacidi
A02B: Farmaci per l'ulcera peptica e la malattia da reflusso gastroesofageo (gerd)
A03: Farmaci per malattie gastrointestinali funzionali
A03A: Farmaci malattie gastrointestinali funzionali
A03B: Belladonna e derivati
A03F: Procinetici
A04: Antiemetici e antinausea
A04A: Antiemetici e antinausea
A05: Bile e terapia del fegato
A05A: Terapia per la bile
...