Html 敏捷包，通过站点搜索指定的字符串

Question

我正在使用 Html Agility Pack 来完成这项任务，基本上我有一个 URL，我的程序应该通读 html 页面的内容它，如果它找到一行文本（即："John had three apples"），它应该将标签的文本更改为 "Found it".

我试着用 contains 来做，但我猜它只检查一个词。

var nodeBFT = doc.DocumentNode.SelectNodes("//*[contains(text(), 'John had three apples')]");

if (nodeBFT != null && nodeBFT.Count != 0)
    myLabel.Text = "Found it";

编辑：我的其余代码，现在加上 ako 的尝试：

if (CheckIfValidUrl(v)) // foreach var v in a list..., checks if the URL works
{
    HtmlWeb hw = new HtmlWeb();
    HtmlDocument doc = hw.Load(v);

    try
    {
        if (doc.DocumentNode.InnerHtml.ToString().Contains("string of words"))
        {
            mylabel.Text = v;
        }
    ...

Answer 1

使用这个：

if (doc.DocumentNode.InnerHtml.ToString().Contains("John had three apples"))
    myLabel.Text="Found it";

Answer 2

一个可能的选择是使用 . 而不是 text()。正如您所怀疑的那样，将 text() 传递给 contains() 函数将仅在搜索的文本是当前元素的第一个直接子元素时才有效：

doc.DocumentNode.SelectNodes("//*[contains(., 'John had three apples')]");

在另一侧，contains(., '...') 评估当前元素的整个文本内容，连接在一起。所以，请注意，上面的 XPath 还将考虑以下元素作为匹配项：

<span>John had <br/>three <strong>apples</strong></span>

如果你需要XPath只考虑整个关键字包含在单个文本节点中的情况，并因此将上述情况视为不匹配，你可以尝试这种方式：

doc.DocumentNode.SelectNodes("//*[text()[contains(., 'John had three apples')]]");

如果上述 none 适合您，请 post 包含关键字但未返回匹配项的最小 HTML 片段，以便我们进一步检查可能导致该行为的原因以及如何修复它。

Html 敏捷包，通过站点搜索指定的字符串

Html Agility Pack, search through site for a specified string of words

c#

html-agility-pack