table 中的 HtmlAgilityPack 多个 tbody
HtmlAgilityPack multiple tbody in table
table 中有多个 Tbodies,我正在尝试使用 HTMLagilitypack 解析它们。通常下面的代码可以工作,但事实并非如此。现在它只打印第一个 tbody 而忽略第二个。
代码
var tableOffense = doc.DocumentNode.SelectSingleNode("//table[@id='OFF']");
var tbody = tableOffense.SelectNodes("tbody");
foreach(var bodies in tbody)
{
Console.WriteLine("id "+offender.offenderId +" "+ Utilities.RemoveHtmlCharacters(bodies.InnerText));
}
HTML
<table id="OFF" class="centerTable" cols="2" style="margin-top:0; width:100%;" cellpadding="0" cellspacing="0">
<tbody>
<!-- %%$SPLIT -->
<tr> <th id="offenseCodeColHdr" scope="row" style="width:25%;" class="uline">Offense Code</th> <td headers="offenseCodeColHdr" class="uline">288(a)</td> </tr> <tr> <th id="descriptionColHdr" scope="row" style="width:25%;" class="uline">Description</th> <td headers="descriptionColHdr" class="uline">LEWD OR LASCIVIOUS ACTS WITH A CHILD UNDER 14 YEARS OF AGE</td> </tr> <tr> <th id="lastConvictionColHdr" scope="row" style="width:25%;" class="uline">Year of Last Conviction</th> <td headers="lastConvictionColHdr" class="uline"> </td> </tr> <tr> <th id="lastReleaseColHdr" scope="row" style="width:25%;" class="uline">Year of Last Release</th> <td headers="lastReleaseColHdr" class="uline"> </td> </tr>
<tr><th colspan="2"><hr style="height:2px;background-color:#000;"></th></tr> </tbody>
<!-- %%$SPLIT -->
<tbody><tr> <th id="offenseCodeColHdr" scope="row" style="width:25%;" class="uline">Offense Code</th> <td headers="offenseCodeColHdr" class="uline">261(a)(2)</td> </tr> <tr> <th id="descriptionColHdr" scope="row" style="width:25%;" class="uline">Description</th> <td headers="descriptionColHdr" class="uline">RAPE BY FORCE OR FEAR</td> </tr> <tr> <th id="lastConvictionColHdr" scope="row" style="width:25%;" class="uline">Year of Last Conviction</th> <td headers="lastConvictionColHdr" class="uline"> </td> </tr> <tr> <th id="lastReleaseColHdr" scope="row" style="width:25%;" class="uline">Year of Last Release</th> <td headers="lastReleaseColHdr" class="uline"> </td> </tr>
<tr><th colspan="2"><hr style="height:2px;background-color:#000;"></th></tr> </tbody>
<!-- %%$SPLIT -->
</table>
我已经单独打印了 tableOffense 节点,以确保第二个 tbody 在加载时存在并且它确实存在。
问题
为什么代码只打印出第一个 tbody 而不是两个?
我还没弄清楚为什么你的代码只给你一个 tbody,但我可以建议一个替代解决方案,select 你所有的 <tbody>
元素吗?
就我个人而言,我会一次性使用 XPAth 和 select 所有 tbody 元素,而无需额外的 SelectNodes()
:
var tbody = doc.DocumentNode.SelectNodes("//table[@id='OFF']//tbody");
foreach (var elem in tbody)
{
//Dump only works in LinqPad
elem.InnerText.Dump();
}
编辑:
以下代码(您的代码)也产生相同的结果
var tableOffense = doc.DocumentNode.SelectSingleNode("//table[@id='OFF']");
var tbody = tableOffense.SelectNodes("//tbody");
table 中有多个 Tbodies,我正在尝试使用 HTMLagilitypack 解析它们。通常下面的代码可以工作,但事实并非如此。现在它只打印第一个 tbody 而忽略第二个。
代码
var tableOffense = doc.DocumentNode.SelectSingleNode("//table[@id='OFF']");
var tbody = tableOffense.SelectNodes("tbody");
foreach(var bodies in tbody)
{
Console.WriteLine("id "+offender.offenderId +" "+ Utilities.RemoveHtmlCharacters(bodies.InnerText));
}
HTML
<table id="OFF" class="centerTable" cols="2" style="margin-top:0; width:100%;" cellpadding="0" cellspacing="0">
<tbody>
<!-- %%$SPLIT -->
<tr> <th id="offenseCodeColHdr" scope="row" style="width:25%;" class="uline">Offense Code</th> <td headers="offenseCodeColHdr" class="uline">288(a)</td> </tr> <tr> <th id="descriptionColHdr" scope="row" style="width:25%;" class="uline">Description</th> <td headers="descriptionColHdr" class="uline">LEWD OR LASCIVIOUS ACTS WITH A CHILD UNDER 14 YEARS OF AGE</td> </tr> <tr> <th id="lastConvictionColHdr" scope="row" style="width:25%;" class="uline">Year of Last Conviction</th> <td headers="lastConvictionColHdr" class="uline"> </td> </tr> <tr> <th id="lastReleaseColHdr" scope="row" style="width:25%;" class="uline">Year of Last Release</th> <td headers="lastReleaseColHdr" class="uline"> </td> </tr>
<tr><th colspan="2"><hr style="height:2px;background-color:#000;"></th></tr> </tbody>
<!-- %%$SPLIT -->
<tbody><tr> <th id="offenseCodeColHdr" scope="row" style="width:25%;" class="uline">Offense Code</th> <td headers="offenseCodeColHdr" class="uline">261(a)(2)</td> </tr> <tr> <th id="descriptionColHdr" scope="row" style="width:25%;" class="uline">Description</th> <td headers="descriptionColHdr" class="uline">RAPE BY FORCE OR FEAR</td> </tr> <tr> <th id="lastConvictionColHdr" scope="row" style="width:25%;" class="uline">Year of Last Conviction</th> <td headers="lastConvictionColHdr" class="uline"> </td> </tr> <tr> <th id="lastReleaseColHdr" scope="row" style="width:25%;" class="uline">Year of Last Release</th> <td headers="lastReleaseColHdr" class="uline"> </td> </tr>
<tr><th colspan="2"><hr style="height:2px;background-color:#000;"></th></tr> </tbody>
<!-- %%$SPLIT -->
</table>
我已经单独打印了 tableOffense 节点,以确保第二个 tbody 在加载时存在并且它确实存在。
问题 为什么代码只打印出第一个 tbody 而不是两个?
我还没弄清楚为什么你的代码只给你一个 tbody,但我可以建议一个替代解决方案,select 你所有的 <tbody>
元素吗?
就我个人而言,我会一次性使用 XPAth 和 select 所有 tbody 元素,而无需额外的 SelectNodes()
:
var tbody = doc.DocumentNode.SelectNodes("//table[@id='OFF']//tbody");
foreach (var elem in tbody)
{
//Dump only works in LinqPad
elem.InnerText.Dump();
}
编辑:
以下代码(您的代码)也产生相同的结果
var tableOffense = doc.DocumentNode.SelectSingleNode("//table[@id='OFF']");
var tbody = tableOffense.SelectNodes("//tbody");