使用 HTML Agility Pack 和 Linq 解析内容

Parsing content with the HTML Agility Pack and Linq

我正在尝试获取 html 中搜索关键字的重要内容。

使用下面的代码生成一个 HtmlNodeCollection

var findclasses = doc.DocumentNode.SelectNodes("//body//*[not(self::script)]").Where(x => x.InnerHtml.Contains("SearchedKeywordText") && x.InnerHtml.Contains("SearchedKeyword1Text")).OrderBy(x => x.Name);
                string FirstContent = findclasses.First().InnerText;

我得到了这个结果

  • Results View Expanding the Results View will enumerate the IEnumerable
  • [0] Name: "div"} HtmlAgilityPack.HtmlNode
  • [1] Name: "div"} HtmlAgilityPack.HtmlNode
  • [2] Name: "div"} HtmlAgilityPack.HtmlNode
  • [3] Name: "ul"} HtmlAgilityPack.HtmlNode
  • [4] Name: "li"} HtmlAgilityPack.HtmlNode
  • [5] Name: "span"} HtmlAgilityPack.HtmlNode
  • [6] Name: "span"} HtmlAgilityPack.HtmlNode
  • [7] Name: "div"} HtmlAgilityPack.HtmlNode
  • [8] Name: "span"} HtmlAgilityPack.HtmlNode
  • [9] Name: "span"} HtmlAgilityPack.HtmlNode
  • [10] Name: "ul"} HtmlAgilityPack.HtmlNode
  • [11] Name: "li"} HtmlAgilityPack.HtmlNode

但是当我想简单修改代码以从外部获取字符串时:

string search1 = "SearchedKeywordText";
string search2 = "SearchedKeyword1Text";
..
..
var findclasses = doc.DocumentNode.SelectNodes("//body//*[not(self::script)]").Where(x => x.InnerHtml.Contains(search1) && x.InnerHtml.Contains(search2)).OrderBy(x => x.Name);
                    string FirstContent = findclasses.First().InnerText;

结果:

  • Results View Expanding the Results View will enumerate the IEnumerable
    Empty "Enumeration yielded no results"

第一个块中的枚举对我来说很好,但在更改之后它就不起作用了。这个简单的问题有什么想法吗?

您正在呼叫 .First()IEnumerable

您可以使用 .Any() 检查 findclasses 是否不为空

if (findclasses.Any())
{
   string firstContent = findclasses.First().InnerText;
}
  • 为什么是空的?

可能有结果,但存在大小写不匹配的情况,因此您需要使搜索不区分大小写,而不是

x.InnerHtml.Contains(search1) 

你可以这样做:

x.InnerHtml.IndexOf(search1,StringComparison.InvariantCultureIgnoreCase)>=0

如果找到搜索关键字而不考虑字母大小写,则 return 为真。