忽略一些 TR 节点

Ignore some TR nodes

我有一个 HTML 喜欢

<body>
<tr class="sysinfoTableCategoryHeader">
    <td colspan="4">Operating System</td>
</tr>

    <tr class="sysinfoTablePropertyEven">
        <td />
        <td />
        <td><span class="sysinfoTablePropertyKey">Operating System Name</span></td>
        <td><span class="sysinfoTablePropertyValue">Linux</span></td>
    </tr>

    <tr class="sysinfoTablePropertyOdd">
        <td />
        <td />
        <td><span class="sysinfoTablePropertyKey">Kernel Version</span></td>
        <td><span class="sysinfoTablePropertyValue">4.8.0-1-amd64</span></td>
    </tr>

<tr class="sysinfoTableCategoryHeader">
    <td colspan="4">Motherboard</td>
</tr>

    <tr class="sysinfoTablePropertyEven">
        <td />
        <td />
        <td><span class="sysinfoTablePropertyKey">Manufacturer</span></td>
        <td><span class="sysinfoTablePropertyValue">Acer</span></td>
    </tr>

    <tr class="sysinfoTablePropertyOdd">
        <td />
        <td />
        <td><span class="sysinfoTablePropertyKey">Product</span></td>
        <td><span class="sysinfoTablePropertyValue">Aspire E5-531</span></td>
    </tr>
</body>

所以我可以从这个 html 文件中选择整个 body,这真的很棒。但是有一个问题。可以说 body 我想忽略具有 class name="sysinfoTableCategoryHeader" 操作系统的节点。

这完全可行吗?

我的输出应该是这样的

<body>
<tr class="sysinfoTableCategoryHeader">
    <td colspan="4">Motherboard</td>
</tr>

    <tr class="sysinfoTablePropertyEven">
        <td />
        <td />
        <td><span class="sysinfoTablePropertyKey">Manufacturer</span></td>
        <td><span class="sysinfoTablePropertyValue">Acer</span></td>
    </tr>

    <tr class="sysinfoTablePropertyOdd">
        <td />
        <td />
        <td><span class="sysinfoTablePropertyKey">Product</span></td>
        <td><span class="sysinfoTablePropertyValue">Aspire E5-531</span></td>
    </tr>
</body>

我怎样才能用HTMLAGILITYPACK来完成它??

你需要找到 xpath //tr[@class!='sysinfoTableCategoryHeader'] xpath 有运算符。

我会一点英语。 经验码:

    HtmlDocument htmlDoc = new HtmlDocument(); 
htmlDoc.LoadHtml(your html code); 
HtmlNodeCollection htmlNodes = htmlDoc.DocumentNode.SelectNodes("//body/tr[@class!='sysinfoTableCategoryHeader']");

htmlNodes 是您需要的。 或者使用 RemoveAllIDforNode();

    HtmlNodeCollection htmlNodes = htmlDoc.DocumentNode.SelectNodes("//body/tr[@class='sysinfoTableCategoryHeader']"); 

foreach (HtmlNode node in htmlNodes) {
 htmlDoc.DocumentNode.RemoveAllIDforNode(node); 
}