C# html 敏捷包无法在循环中获取唯一内容
C# html agility pack not getting unique content in a loop
我正在尝试抓取文章、标题和完整文章的 URL 网站。在我的循环中,我一直为每个 运行 获得相同的标题,但我的 URL 是独一无二的并且有效。我 missing/doing 哪里错了?是不是方法不对?
// List
var content = htmlDoc.DocumentNode.SelectNodes("/html/body/div/div/div/div/div/div/div/div/div/article").ToList();
foreach (var article in content)
{
// Get news title
string articleUrl = article.SelectSingleNode("a").Attributes["href"].Value;
// Get url to full content
string articleTitle = article.SelectSingleNode("//div[@class='dre-item__alt-title--md']/div").InnerText;
Console.WriteLine("Title: {0}", articleTitle);
Console.WriteLine("Url: {0}", articleUrl);
Console.WriteLine("--------");
}
输出,
Title: Røkkes krillselskap guider 2,5-gangeren
Url: /nyheter/teknologi/2021/02/14/7625196/intel-utfordreren-ascenium-har-steget-36-pa-tre-uker
--------
Title: Røkkes krillselskap guider 2,5-gangeren
Url: /nyheter/bors/2021/02/16/7626652/dnb-markets-aksjemarkedet-responderer-positivt
--------
Title: Røkkes krillselskap guider 2,5-gangeren
Url: /nyheter/shipping/2021/02/15/7625988/kepler-cheuvreux-oker-kursmalet-pa-golden-ocean-group
--------
Title: Røkkes krillselskap guider 2,5-gangeren
Url: /leder/2021/02/09/7622385/elon-musk-og-andre-spekulanter
以 /
开头的 XPath 表达式从根开始,即使您在子节点上调用它也是如此。
以 .//
开始表达式以从该“文章”节点开始。
我正在尝试抓取文章、标题和完整文章的 URL 网站。在我的循环中,我一直为每个 运行 获得相同的标题,但我的 URL 是独一无二的并且有效。我 missing/doing 哪里错了?是不是方法不对?
// List
var content = htmlDoc.DocumentNode.SelectNodes("/html/body/div/div/div/div/div/div/div/div/div/article").ToList();
foreach (var article in content)
{
// Get news title
string articleUrl = article.SelectSingleNode("a").Attributes["href"].Value;
// Get url to full content
string articleTitle = article.SelectSingleNode("//div[@class='dre-item__alt-title--md']/div").InnerText;
Console.WriteLine("Title: {0}", articleTitle);
Console.WriteLine("Url: {0}", articleUrl);
Console.WriteLine("--------");
}
输出,
Title: Røkkes krillselskap guider 2,5-gangeren
Url: /nyheter/teknologi/2021/02/14/7625196/intel-utfordreren-ascenium-har-steget-36-pa-tre-uker
--------
Title: Røkkes krillselskap guider 2,5-gangeren
Url: /nyheter/bors/2021/02/16/7626652/dnb-markets-aksjemarkedet-responderer-positivt
--------
Title: Røkkes krillselskap guider 2,5-gangeren
Url: /nyheter/shipping/2021/02/15/7625988/kepler-cheuvreux-oker-kursmalet-pa-golden-ocean-group
--------
Title: Røkkes krillselskap guider 2,5-gangeren
Url: /leder/2021/02/09/7622385/elon-musk-og-andre-spekulanter
以 /
开头的 XPath 表达式从根开始,即使您在子节点上调用它也是如此。
以 .//
开始表达式以从该“文章”节点开始。