如何在每个选定节点后读取 'n' 后面的标签

How to read 'n' following tags after each selected node

我正在尝试使用 VB .NET 和 HTML Agility Pack (HAP) 从以下 HTML table 获取每个玩家的统计数据,但我没有' 知道如何 select 每个玩家行后的标签。

    <table class="stats" cellspacing="0">
   <tr class="statsgreen">
      <td colspan="10" class="estverdel">Team A</td>
      <td colspan="2">REB</td>
      <td colspan="4">&nbsp;</td>
      <td colspan="2">BLK</td>
      <td>&nbsp;</td>
      <td colspan="2">PF</td>
      <td>&nbsp;</td>
      <td>&nbsp;</td>
   </tr>
   <tr class="statsgreen">
      <td>Num</td>
      <td>Name</td>
      <td>Min</td>
      <td>GS</td>
      <td>T2</td>
      <td>T2 %</td>
      <td>T3</td>
      <td>T3 %</td>
      <td>T1</td>
      <td>T1 %</td>
      <td>T</td>
      <td>D+O</td>
      <td>A</td>
      <td>ST</td>
      <td>LO</td>
      <td>C</td>
      <td>R</td>
      <td>C</td>
      <td>M</td>
      <td>R</td>
      <td>C</td>
      <td>+/-</td>
      <td>PIE</td>
   </tr>   
    <tr>
      <td>6</td>
      <td><a href="/player.php?id=001">Player 1</a></td>
      <td>30:22</td>
      <td>18</td>
      <td>4/10</td>
      <td>40%</td>
      <td>2/6</td>
      <td>33%</td>
      <td>4/4</td>
      <td>100%</td>
      <td>9</td>
      <td>5+4</td>
      <td>1</td>
      <td>1</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>3</td>
      <td>4</td>
      <td>10</td>
      <td>20</td>
   </tr>
   <tr>
      <td>6</td>
      <td><a href="/player.php?id=002">Player 2</a></td>
      <td>30:22</td>
      <td>18</td>
      <td>4/10</td>
      <td>40%</td>
      <td>2/6</td>
      <td>33%</td>
      <td>4/4</td>
      <td>100%</td>
      <td>9</td>
      <td>5+4</td>
      <td>1</td>
      <td>1</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>3</td>
      <td>4</td>
      <td>10</td>
      <td>20</td>
   </tr>
   ...
   ...
   <tr class="statsgreen">
      <td colspan="10" class="estverdel">Team B</td>
      <td colspan="2">REB</td>
      <td colspan="4">&nbsp;</td>
      <td colspan="2">BLK</td>
      <td>&nbsp;</td>
      <td colspan="2">PF</td>
      <td>&nbsp;</td>
      <td>&nbsp;</td>
   </tr>
   <tr class="statsgreen">
      <td>Num</td>
      <td>Name</td>
      <td>Min</td>
      <td>GS</td>
      <td>T2</td>
      <td>T2 %</td>
      <td>T3</td>
      <td>T3 %</td>
      <td>T1</td>
      <td>T1 %</td>
      <td>T</td>
      <td>D+O</td>
      <td>A</td>
      <td>ST</td>
      <td>LO</td>
      <td>C</td>
      <td>R</td>
      <td>C</td>
      <td>M</td>
      <td>R</td>
      <td>C</td>
      <td>+/-</td>
      <td>PIE</td>
   </tr>   
    <tr>
      <td>6</td>
      <td><a href="/player.php?id=013">Player 13</a></td>
      <td>30:22</td>
      <td>18</td>
      <td>4/10</td>
      <td>40%</td>
      <td>2/6</td>
      <td>33%</td>
      <td>4/4</td>
      <td>100%</td>
      <td>9</td>
      <td>5+4</td>
      <td>1</td>
      <td>1</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>0</td>
      <td>3</td>
      <td>4</td>
      <td>10</td>
      <td>20</td>
   </tr>
</table>

这是我在 VB.NET 中的不完整代码,仅获取球队和球员姓名:

Private Sub btnGetStats_Click(sender As Object, e As EventArgs) Handles btnGetStats.Click
    Dim doc As New HtmlDocument                    
    doc.Load("C:[=12=]1.html")

    'Get team names  
    For Each nodeteams As HtmlNode In doc.DocumentNode.SelectNodes("//td[@class=""estverdel""]")                    
        MessageBox.Show("Team: " + nodeteams.InnerText)                
    Next

    'Get player names
    For Each nodeplayers As HtmlNode In doc.DocumentNode.SelectNodes("//a[contains(@href, '/player')]")
        MessageBox.Show(nodeplayers.InnerText)    
    Next
End Sub

是否有任何 XPATH 语句可以用于 selecting 每个播放器节点,然后遍历以下 21 个统计字段中的每一个?

作为替代方案,我想我可以获得 nodeplayers.line,然后使用 System.IO.StreamReader 阅读以下 21 行,但也许 HAP 可以以一种聪明的方式做到这一点。

一种可能性是使用 HtmlNode 玩家对象的 ParentNode 属性:

  • 取找到的玩家节点的父节点的父节点(来自<tr><td><a player>...的tr节点)
  • 取所有子节点(所有td个节点)
  • 使用LINQ Skip跳过前两个子节点(号码和玩家link)
  • 取其余子节点

像这样修改你的第二个循环:

   'Get player names
   for each nodeplayers as HtmlNode in doc.DocumentNode.SelectNodes("//a[contains(@href, '/player')]")
        Console.WriteLine("Player: " + nodeplayers.InnerText)
        ' select parent node (tr) of player (a) parent node (td), skip first two and take the rest 
        for each node as HtmlNode in nodeplayers.ParentNode.ParentNode.ChildNodes.Skip(2).ToList()
            Console.WriteLine(node.InnerText)
        next
   next

returns 每个玩家的所有值:

Team: Team A
Team: Team B
Player: Player 1   
30:22  
18
4/10 
40% 
2/6
33% 
4/4
100% 
9 
5+4  
1  
1 
0    
0   
0  
0    
0    
3    
4    
10    
20
...