忽略一些 TR 节点
Ignore some TR nodes
我有一个 HTML 喜欢
<body>
<tr class="sysinfoTableCategoryHeader">
<td colspan="4">Operating System</td>
</tr>
<tr class="sysinfoTablePropertyEven">
<td />
<td />
<td><span class="sysinfoTablePropertyKey">Operating System Name</span></td>
<td><span class="sysinfoTablePropertyValue">Linux</span></td>
</tr>
<tr class="sysinfoTablePropertyOdd">
<td />
<td />
<td><span class="sysinfoTablePropertyKey">Kernel Version</span></td>
<td><span class="sysinfoTablePropertyValue">4.8.0-1-amd64</span></td>
</tr>
<tr class="sysinfoTableCategoryHeader">
<td colspan="4">Motherboard</td>
</tr>
<tr class="sysinfoTablePropertyEven">
<td />
<td />
<td><span class="sysinfoTablePropertyKey">Manufacturer</span></td>
<td><span class="sysinfoTablePropertyValue">Acer</span></td>
</tr>
<tr class="sysinfoTablePropertyOdd">
<td />
<td />
<td><span class="sysinfoTablePropertyKey">Product</span></td>
<td><span class="sysinfoTablePropertyValue">Aspire E5-531</span></td>
</tr>
</body>
所以我可以从这个 html 文件中选择整个 body,这真的很棒。但是有一个问题。可以说 body 我想忽略具有 class
name="sysinfoTableCategoryHeader"
操作系统的节点。
这完全可行吗?
我的输出应该是这样的
<body>
<tr class="sysinfoTableCategoryHeader">
<td colspan="4">Motherboard</td>
</tr>
<tr class="sysinfoTablePropertyEven">
<td />
<td />
<td><span class="sysinfoTablePropertyKey">Manufacturer</span></td>
<td><span class="sysinfoTablePropertyValue">Acer</span></td>
</tr>
<tr class="sysinfoTablePropertyOdd">
<td />
<td />
<td><span class="sysinfoTablePropertyKey">Product</span></td>
<td><span class="sysinfoTablePropertyValue">Aspire E5-531</span></td>
</tr>
</body>
我怎样才能用HTMLAGILITYPACK
来完成它??
你需要找到 xpath //tr[@class!='sysinfoTableCategoryHeader']
xpath 有运算符。
我会一点英语。
经验码:
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(your html code);
HtmlNodeCollection htmlNodes = htmlDoc.DocumentNode.SelectNodes("//body/tr[@class!='sysinfoTableCategoryHeader']");
htmlNodes 是您需要的。
或者使用 RemoveAllIDforNode();
HtmlNodeCollection htmlNodes = htmlDoc.DocumentNode.SelectNodes("//body/tr[@class='sysinfoTableCategoryHeader']");
foreach (HtmlNode node in htmlNodes) {
htmlDoc.DocumentNode.RemoveAllIDforNode(node);
}
我有一个 HTML 喜欢
<body>
<tr class="sysinfoTableCategoryHeader">
<td colspan="4">Operating System</td>
</tr>
<tr class="sysinfoTablePropertyEven">
<td />
<td />
<td><span class="sysinfoTablePropertyKey">Operating System Name</span></td>
<td><span class="sysinfoTablePropertyValue">Linux</span></td>
</tr>
<tr class="sysinfoTablePropertyOdd">
<td />
<td />
<td><span class="sysinfoTablePropertyKey">Kernel Version</span></td>
<td><span class="sysinfoTablePropertyValue">4.8.0-1-amd64</span></td>
</tr>
<tr class="sysinfoTableCategoryHeader">
<td colspan="4">Motherboard</td>
</tr>
<tr class="sysinfoTablePropertyEven">
<td />
<td />
<td><span class="sysinfoTablePropertyKey">Manufacturer</span></td>
<td><span class="sysinfoTablePropertyValue">Acer</span></td>
</tr>
<tr class="sysinfoTablePropertyOdd">
<td />
<td />
<td><span class="sysinfoTablePropertyKey">Product</span></td>
<td><span class="sysinfoTablePropertyValue">Aspire E5-531</span></td>
</tr>
</body>
所以我可以从这个 html 文件中选择整个 body,这真的很棒。但是有一个问题。可以说 body 我想忽略具有 class
name="sysinfoTableCategoryHeader"
操作系统的节点。
这完全可行吗?
我的输出应该是这样的
<body>
<tr class="sysinfoTableCategoryHeader">
<td colspan="4">Motherboard</td>
</tr>
<tr class="sysinfoTablePropertyEven">
<td />
<td />
<td><span class="sysinfoTablePropertyKey">Manufacturer</span></td>
<td><span class="sysinfoTablePropertyValue">Acer</span></td>
</tr>
<tr class="sysinfoTablePropertyOdd">
<td />
<td />
<td><span class="sysinfoTablePropertyKey">Product</span></td>
<td><span class="sysinfoTablePropertyValue">Aspire E5-531</span></td>
</tr>
</body>
我怎样才能用HTMLAGILITYPACK
来完成它??
你需要找到 xpath //tr[@class!='sysinfoTableCategoryHeader'] xpath 有运算符。
我会一点英语。 经验码:
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(your html code);
HtmlNodeCollection htmlNodes = htmlDoc.DocumentNode.SelectNodes("//body/tr[@class!='sysinfoTableCategoryHeader']");
htmlNodes 是您需要的。 或者使用 RemoveAllIDforNode();
HtmlNodeCollection htmlNodes = htmlDoc.DocumentNode.SelectNodes("//body/tr[@class='sysinfoTableCategoryHeader']");
foreach (HtmlNode node in htmlNodes) {
htmlDoc.DocumentNode.RemoveAllIDforNode(node);
}