如何从 MSHTML.IHTMLElementCollection 中选择特定的 table
How to pick specific table from MSHTML.IHTMLElementCollection
我正在抓取一个网站,我知道我总是想要 IHTMLElementCollection 中的第 16 个 table。我如何挑选那个特定的?在创建过程中,我只通过 运行 一个 for 循环跳过了这个,并用一个计数器(船长)遍历了它们,但我在优化点,这对我来说是一个很大的问题。
For Each HTMLTable In HTMLTables
'Temp variable to set which table gets used - not sure how to
'select the table i want
skipper = skipper + 1
Do While skipper = 16
'Checks if worksheet exists with machine serial number
'if not - creates it, if it is, sets as active
For x = 1 To worksh
If Worksheets(x).Name = MachineSerials Then
Worksheets(MachineSerials).Activate
worksheetexists = True
Exit For
End If
Next x
If worksheetexists = False Then
Set ws = Sheets.Add(After:=Sheets(Sheets.Count))
ws.Name = MachineSerials
Range("A1") = "Last Updated : "
End If
'inserts time stamp for last updated at top of page
Range("B1").Value = Now
RowNum = 2
'Dumps tables information into sheet
For Each HTMLRow In HTMLTable.getElementsByTagName("tr")
ColNum = 1
For Each HTMLCell In HTMLRow.Children
'Checks if the new information is the same as whats already on the screen
If StrComp(HTMLCell.innerText, Cells(RowNum, ColNum)) = 0 Or Format(HTMLCell.innerText, "yyyy/mm/dd") = Format(Cells(RowNum, ColNum), "yyyy/mm/dd") Then
ColNum = ColNum + 1
Else
Cells(RowNum, ColNum) = HTMLCell.innerText
ColNum = ColNum + 1
change = 1
End If
Next HTMLCell
RowNum = RowNum + 1
Next HTMLRow
Exit For
Loop
Next HTMLTable
尝试使用 item
:
The getElementsByTagName() method returns a collection of all elements
in the document with the specified tag name, as a NodeList object.
The NodeList object represents a collection of nodes. The nodes can be
accessed by index numbers. The index starts at 0.
在你的情况下,我将从以下内容开始:
Set htmlTables = html.getElementsByTagName("table")
Debug.Print htmlTables.Length
Debug.Print htmlTables.Item(12).innerHTML
这将验证代码是否正确选择了第 13 个 table (Item(12)
)。然后您可以继续您的其余代码,例如:
For Each htmlRow In htmlTable.getElementsByTagName("tr")
Debug.Print htmlRow.innerText
Next htmlRow
希望本文能为您指明正确的方向。
编辑
为了结合table和行索引,您可以使用以下代码(这里是w3schools网页的示例):
Set htmlTables = html.getElementsByTagName("table")
Set htmlTable = htmlTables.Item(1)
Set htmlRow = htmlTable.getElementsByTagName("tr").Item(2)
Debug.Print htmlRow.innerText
Item(2)
指的是第二行 align
table Item(1)
:
立即window正确打印出以下结果:
我正在抓取一个网站,我知道我总是想要 IHTMLElementCollection 中的第 16 个 table。我如何挑选那个特定的?在创建过程中,我只通过 运行 一个 for 循环跳过了这个,并用一个计数器(船长)遍历了它们,但我在优化点,这对我来说是一个很大的问题。
For Each HTMLTable In HTMLTables
'Temp variable to set which table gets used - not sure how to
'select the table i want
skipper = skipper + 1
Do While skipper = 16
'Checks if worksheet exists with machine serial number
'if not - creates it, if it is, sets as active
For x = 1 To worksh
If Worksheets(x).Name = MachineSerials Then
Worksheets(MachineSerials).Activate
worksheetexists = True
Exit For
End If
Next x
If worksheetexists = False Then
Set ws = Sheets.Add(After:=Sheets(Sheets.Count))
ws.Name = MachineSerials
Range("A1") = "Last Updated : "
End If
'inserts time stamp for last updated at top of page
Range("B1").Value = Now
RowNum = 2
'Dumps tables information into sheet
For Each HTMLRow In HTMLTable.getElementsByTagName("tr")
ColNum = 1
For Each HTMLCell In HTMLRow.Children
'Checks if the new information is the same as whats already on the screen
If StrComp(HTMLCell.innerText, Cells(RowNum, ColNum)) = 0 Or Format(HTMLCell.innerText, "yyyy/mm/dd") = Format(Cells(RowNum, ColNum), "yyyy/mm/dd") Then
ColNum = ColNum + 1
Else
Cells(RowNum, ColNum) = HTMLCell.innerText
ColNum = ColNum + 1
change = 1
End If
Next HTMLCell
RowNum = RowNum + 1
Next HTMLRow
Exit For
Loop
Next HTMLTable
尝试使用 item
:
The getElementsByTagName() method returns a collection of all elements in the document with the specified tag name, as a NodeList object.
The NodeList object represents a collection of nodes. The nodes can be accessed by index numbers. The index starts at 0.
在你的情况下,我将从以下内容开始:
Set htmlTables = html.getElementsByTagName("table")
Debug.Print htmlTables.Length
Debug.Print htmlTables.Item(12).innerHTML
这将验证代码是否正确选择了第 13 个 table (Item(12)
)。然后您可以继续您的其余代码,例如:
For Each htmlRow In htmlTable.getElementsByTagName("tr")
Debug.Print htmlRow.innerText
Next htmlRow
希望本文能为您指明正确的方向。
编辑
为了结合table和行索引,您可以使用以下代码(这里是w3schools网页的示例):
Set htmlTables = html.getElementsByTagName("table")
Set htmlTable = htmlTables.Item(1)
Set htmlRow = htmlTable.getElementsByTagName("tr").Item(2)
Debug.Print htmlRow.innerText
Item(2)
指的是第二行 align
table Item(1)
:
立即window正确打印出以下结果: