使用 VBA 从网站上的 table 检索 <TD> 标签并放入 excel

Retreiving <TD> tag from table on website using VBA and put into excel

我正在尝试从网站上的 <TD> 标签中检索信息。

它有效,但我似乎无法从 <TR> 标签中的第二个 <td> 标签获取文本,同时使用条件语句获取第二个标签,因为这是我看到的唯一方式这样可行。该代码可以很好地提取信息我只是想不出如何在第一个 <td> 中找到匹配项的条件下访问第二个。

所以实际的 html table 看起来像这样。

<html>
<head></head>
<body>
<table id="Table2">
<tr>
  <td class="tSystemRight">System Name: -if this matches</td>
  <td class="tSystemLeft breakword">Windows3756 -I need this</td>
</tr>
<tr>
  <td class="tSystemRight">System Acronym: -if this matches</td>
  <td class="tSystemLeft breakword">WIN37  -I need this</td>
</tr>
</table>
</body>
</html>

我的 VBA 脚本是:

excelRow = 2

For Each tr In msxml.tableRows
cellCount = 1
   For Each TD In tr.getElementsByTagName("TD")
    If ((cellCount = 1) And (TD.innerText = "System Acronym:")) Then
       Worksheets("Data").Cells(excelRow, 2).value = Cells(1, 2)
    ElseIf ((cellCount = 1) And (TD.innerText = "System Name:")) Then
       Worksheets("Data").Cells(excelRow, 3).value = Cells(1, 2)
    cellCount = cellCount + 1
    End If
   Next
Next

这只会在 excel sheet

中显示 System Name:System Acronym:

我从一个 public 网站开发了以下内容,其结构与您的几乎相同。 (https://www.federalreserve.gov/releases/h3/current/)

需要参考 Microsoft Internet ControlsMicrosoft HTML Object Library

Option Explicit

Sub Test()

Dim ie As New InternetExplorer
Dim doc As New HTMLDocument

With ie

    .Visible = True
    .Navigate "https://www.federalreserve.gov/releases/h3/current/"

    'can place code to wait for IE to load here .. I skipped it since its not in direct focus of question

    Set doc = .Document

    Dim t As HTMLTable
    Dim r As HTMLTableRow
    Dim c As HTMLTableCol

    Set t = doc.getElementById("t1tg1")

    'loop through each row
    For Each r In t.Rows

        If r.Cells(0).innerText = "Mar. 2016" Then Debug.Print r.Cells(1).innerText

        'loop through each column in the row
        'For Each c In r.Cells

        '    Debug.Print c.innerText

        'Next

    Next

End With

End Sub

综上所述,在像我上面那样设置你的特定 table 之后,我建议对你的代码进行以下编辑(我省略了细胞计数检查和其他内容):

For Each r In t.Rows

    'find out which columns System Acronym and value will be and modify the Cells(n) statements          
    If r.Cells(0).innerText = "System Acronym:" Then Worksheets("Data").Cells(excelRow, 2).Value = r.Cells(2).innerText

Next

如果你有一个 td 元素并且你想获取行中下一个 td 的内部文本然后使用 nextSibling 属性,像这样:

For Each td In tr.getElementsByTagName("TD")
    If ((cellCount = 1) And (td.innerText = "System Acronym:")) Then
       Worksheets("Data").Cells(excelRow, 2).Value = td.NextSibling.innerText
    ElseIf ((cellCount = 1) And (td.innerText = "System Name:")) Then
       Worksheets("Data").Cells(excelRow, 3).Value = td.NextSibling.innerText
    cellCount = cellCount + 1
    End If
   Next
Next

请注意,给定代码中的任何内容都不会更改 excelRow 的值,因此所有内容都将继续写入同一行。另请注意,给定的 HTML 首先是 "System Name",然后是 "System Acronym",而代码的结构似乎是首先寻找 "System Acronym",然后是 "System Name"