从 html 个敏捷包中过滤一个字符串

Filter a string from html agility pack

我从 URL 中获取 html,然后我 select 元素 table 和 select tr 中的所有元素 table 在其 id 属性值中包含 tr。现在我有 20 个左右这样的元素:

<th class="nw">1 Jan</th><td class="nw">Friday</td><td><a href="/holidays/andorra/new-year-day">New Year&#39;s Day</a></td><td>National holiday</td>

如何从上面的元素中单独获取每个文本?
示例输出:1 Jan/Friday/New Year's Day/National holiday

var url = "https://www.timeanddate.com/holidays/andorra/";
var client = new HttpClient();
client.DefaultRequestHeaders.Add("Accept-Language", "en-US,en;q=0.5");
var html = await client.GetStringAsync(url);

var document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(html);

var a1 = document.DocumentNode.Descendants("table")
    .Where(node => node.GetAttributeValue("id","").Equals("holidays-table"))
    .ToList();

var a2 = a1[0].Descendants("tr")
    .Where(node => node.GetAttributeValue("id","").Contains("tr"))
    .ToList();

这应该能满足您的需求:

List<List<string>> holidays = document
    .DocumentNode
    .SelectNodes("//table[@id='holidays-table']/tbody/tr")
    .Select(tr => tr.ChildNodes
                    .Where(n => n.Name == "th" || n.Name == "td")
                    .Select(n => n.InnerText.Trim())
                    .ToList())
    .Where(row => row.Any())  // filter out empty rows
    .ToList();

foreach (var row in holidays)
{
    Console.WriteLine(string.Join(", ", row));
}

此处的工作演示:https://dotnetfiddle.net/0SADls