获取 div 内的所有链接并将它们保存到列表中

Question

我使用 WebClient 和 downloadString() 下载了一个 html 站点，然后我尝试将它们之间的所有 link 放入一个列表中。

经过几次尝试和 2 小时的工作，有 1 次我得到了所有 link，有时我只得到一个，有时我得到 none。

这是我的代码示例 - 为了更好的可读性，我只是去掉了 Catch 块。

List<string> getLinks = new List<string>();
for (int i = 0; i < wikiUrls.Length; i++)
{
    try
    {
        string download = client.DownloadString(wikiUrls[i]);
        string searchForDiv = "<div class=\"wiki\">";
        int firstCharacter = download.IndexOf(searchForDiv);
        //if wiki doens't exists, go to next element of for loop
        if (firstCharacter == -1)
            continue;
        else
        {
            HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
            document.LoadHtml(download);
            string nodes = String.Empty;
            var div = document.DocumentNode.SelectSingleNode("//div[@class=\"wiki\"]");
            if (div != null)
            {
                getLinks = div.Descendants("a").Select(node => node.GetAttributeValue("href", "Not found \n")).ToList(); 
                output.Text = string.Join(" ", getLinks);
            }
        }
    }

Answer 1

我明白了。这是因为

getLinks = div.Descendants("a").Select(node => node.GetAttributeValue("href", "Not found \n")).ToList();

GetLinks 总是被覆盖，因为它在 for 循环中。我用这个解决了它：

getLinks.AddRange(div.Descendants("a").Select(node => node.GetAttributeValue("href", String.Empty)).ToList());

获取 div 内的所有链接并将它们保存到列表中

Get all links inside a div and save them to a list

html

c#

html-agility-pack