使用 HTML Agility Pack 和 Linq 解析内容
Parsing content with the HTML Agility Pack and Linq
我正在尝试获取 html 中搜索关键字的重要内容。
使用下面的代码生成一个 HtmlNodeCollection
var findclasses = doc.DocumentNode.SelectNodes("//body//*[not(self::script)]").Where(x => x.InnerHtml.Contains("SearchedKeywordText") && x.InnerHtml.Contains("SearchedKeyword1Text")).OrderBy(x => x.Name);
string FirstContent = findclasses.First().InnerText;
我得到了这个结果
- Results View Expanding the Results View will enumerate the IEnumerable
- [0] Name: "div"} HtmlAgilityPack.HtmlNode
- [1] Name: "div"} HtmlAgilityPack.HtmlNode
- [2] Name: "div"} HtmlAgilityPack.HtmlNode
- [3] Name: "ul"} HtmlAgilityPack.HtmlNode
- [4] Name: "li"} HtmlAgilityPack.HtmlNode
- [5] Name: "span"} HtmlAgilityPack.HtmlNode
- [6] Name: "span"} HtmlAgilityPack.HtmlNode
- [7] Name: "div"} HtmlAgilityPack.HtmlNode
- [8] Name: "span"} HtmlAgilityPack.HtmlNode
- [9] Name: "span"} HtmlAgilityPack.HtmlNode
- [10] Name: "ul"} HtmlAgilityPack.HtmlNode
- [11] Name: "li"} HtmlAgilityPack.HtmlNode
但是当我想简单修改代码以从外部获取字符串时:
string search1 = "SearchedKeywordText";
string search2 = "SearchedKeyword1Text";
..
..
var findclasses = doc.DocumentNode.SelectNodes("//body//*[not(self::script)]").Where(x => x.InnerHtml.Contains(search1) && x.InnerHtml.Contains(search2)).OrderBy(x => x.Name);
string FirstContent = findclasses.First().InnerText;
结果:
- Results View Expanding the Results View will enumerate the IEnumerable
Empty "Enumeration yielded no results"
第一个块中的枚举对我来说很好,但在更改之后它就不起作用了。这个简单的问题有什么想法吗?
您正在呼叫 .First()
空 IEnumerable
您可以使用 .Any()
检查 findclasses 是否不为空
if (findclasses.Any())
{
string firstContent = findclasses.First().InnerText;
}
- 为什么是空的?
可能有结果,但存在大小写不匹配的情况,因此您需要使搜索不区分大小写,而不是
x.InnerHtml.Contains(search1)
你可以这样做:
x.InnerHtml.IndexOf(search1,StringComparison.InvariantCultureIgnoreCase)>=0
如果找到搜索关键字而不考虑字母大小写,则 return 为真。
我正在尝试获取 html 中搜索关键字的重要内容。
使用下面的代码生成一个 HtmlNodeCollection
var findclasses = doc.DocumentNode.SelectNodes("//body//*[not(self::script)]").Where(x => x.InnerHtml.Contains("SearchedKeywordText") && x.InnerHtml.Contains("SearchedKeyword1Text")).OrderBy(x => x.Name);
string FirstContent = findclasses.First().InnerText;
我得到了这个结果
- Results View Expanding the Results View will enumerate the IEnumerable
- [0] Name: "div"} HtmlAgilityPack.HtmlNode
- [1] Name: "div"} HtmlAgilityPack.HtmlNode
- [2] Name: "div"} HtmlAgilityPack.HtmlNode
- [3] Name: "ul"} HtmlAgilityPack.HtmlNode
- [4] Name: "li"} HtmlAgilityPack.HtmlNode
- [5] Name: "span"} HtmlAgilityPack.HtmlNode
- [6] Name: "span"} HtmlAgilityPack.HtmlNode
- [7] Name: "div"} HtmlAgilityPack.HtmlNode
- [8] Name: "span"} HtmlAgilityPack.HtmlNode
- [9] Name: "span"} HtmlAgilityPack.HtmlNode
- [10] Name: "ul"} HtmlAgilityPack.HtmlNode
- [11] Name: "li"} HtmlAgilityPack.HtmlNode
但是当我想简单修改代码以从外部获取字符串时:
string search1 = "SearchedKeywordText";
string search2 = "SearchedKeyword1Text";
..
..
var findclasses = doc.DocumentNode.SelectNodes("//body//*[not(self::script)]").Where(x => x.InnerHtml.Contains(search1) && x.InnerHtml.Contains(search2)).OrderBy(x => x.Name);
string FirstContent = findclasses.First().InnerText;
结果:
- Results View Expanding the Results View will enumerate the IEnumerable
Empty "Enumeration yielded no results"
第一个块中的枚举对我来说很好,但在更改之后它就不起作用了。这个简单的问题有什么想法吗?
您正在呼叫 .First()
空 IEnumerable
您可以使用 .Any()
检查 findclasses 是否不为空
if (findclasses.Any())
{
string firstContent = findclasses.First().InnerText;
}
- 为什么是空的?
可能有结果,但存在大小写不匹配的情况,因此您需要使搜索不区分大小写,而不是
x.InnerHtml.Contains(search1)
你可以这样做:
x.InnerHtml.IndexOf(search1,StringComparison.InvariantCultureIgnoreCase)>=0
如果找到搜索关键字而不考虑字母大小写,则 return 为真。