如何使用 getELementsbyTagName 修复 'for each' 迭代?
How to fix 'for each' iteration with getELementsbyTagName?
我在 VBA/Excel 中使用 MSXML 和 WinHTTP。我正在尝试从一个元素内的所有
标记元素中提取 'innertext'。
子程序如何遍历特定 class 中的所有
标记并填充工作表?
提前致谢。
我正在尝试将此策略 [0] 应用到此网站 [1]
[0] https://codingislove.com/parse-html-in-excel-vba/
[1] https://www.fool.com/earnings/call-transcripts/2019/07/17/netflix-inc-nflx-q2-2019-earnings-call-transcript.aspx
Sub tryKeywordsearch()
Dim http As Object, html As New HTMLDocument
Dim paras As Object, titleElem As Object, detailsElem As Object, para As HTMLHtmlElement
Dim i As Integer
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", "https://www.fool.com/earnings/call-transcripts/2019/07/17/netflix-inc-nflx-q2-2019-earnings-call-transcript.aspx", False
http.Send
html.body.innerHTML = http.responseText
Set paras = html.getElementsByClassName("article-content")
i = 1
For Each para In paras
Set para = para.getElementsByTagName("p")(i)
Sheets(1).Cells(i, 1).Value = para.innerText
i = i + 1
Next
End sub
实际上只有一个元素具有 class 名称,article-content
,因此您正在执行一个元素的外循环,因此不会超过 i = 1
。此外,在您的第一个循环中,您正在更改您正在循环的变量,这很可能会导致错误。
For Each para In paras
Set para = para.getElementsByTagName("p")(i)
在上面,para
是你的循环变量。
此外,para.getElementsByTagName("p")
返回的集合将从 0
开始。
如果您索引到 getElementsByClassName
返回的初始集合,然后链接到 getElementsByTagName
,并将其用作 For Each
的集合(留下索引),那么您的代码将如何工作从 1 开始,然后您可以使用它写出正确的行;您可以使用循环变量 para
获取当前节点 innerText
):
Option Explicit
Public Sub TryKeywordSearch()
Dim http As Object, html As New HTMLDocument
Dim paras As Object, para As Object, i As Long
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", "https://www.fool.com/earnings/call-transcripts/2019/07/17/netflix-inc-nflx-q2-2019-earnings-call-transcript.aspx", False
http.send
html.body.innerHTML = http.responseText
Set paras = html.getElementsByClassName("article-content")(0).getElementsByTagName("p")
i = 1
For Each para In paras
ThisWorkbook.Worksheets("Sheet1").Cells(i, 1).Value = para.innerText
i = i + 1
Next
End Sub
相反,您可以使用更快、更易读的 IMO css selector combination 来获取父级中的所有 p
标签 class article-content
:
Option Explicit
Public Sub GetParagraphs()
Dim http As Object, html As HTMLDocument, paragraphs As Object, i As Long
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.fool.com/earnings/call-transcripts/2019/07/17/netflix-inc-nflx-q2-2019-earnings-call-transcript.aspx", False
.send
html.body.innerHTML = .responseText
End With
Set paragraphs = html.querySelectorAll(".article-content p")
For i = 0 To paragraphs.Length - 1
ThisWorkbook.Worksheets("Sheet1").Cells(i + 1, 1) = paragraphs.item(i).innerText
Next i
End Sub
我在 VBA/Excel 中使用 MSXML 和 WinHTTP。我正在尝试从一个元素内的所有
标记元素中提取 'innertext'。
子程序如何遍历特定 class 中的所有
标记并填充工作表?
提前致谢。
我正在尝试将此策略 [0] 应用到此网站 [1]
[0] https://codingislove.com/parse-html-in-excel-vba/ [1] https://www.fool.com/earnings/call-transcripts/2019/07/17/netflix-inc-nflx-q2-2019-earnings-call-transcript.aspx
Sub tryKeywordsearch()
Dim http As Object, html As New HTMLDocument
Dim paras As Object, titleElem As Object, detailsElem As Object, para As HTMLHtmlElement
Dim i As Integer
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", "https://www.fool.com/earnings/call-transcripts/2019/07/17/netflix-inc-nflx-q2-2019-earnings-call-transcript.aspx", False
http.Send
html.body.innerHTML = http.responseText
Set paras = html.getElementsByClassName("article-content")
i = 1
For Each para In paras
Set para = para.getElementsByTagName("p")(i)
Sheets(1).Cells(i, 1).Value = para.innerText
i = i + 1
Next
End sub
实际上只有一个元素具有 class 名称,article-content
,因此您正在执行一个元素的外循环,因此不会超过 i = 1
。此外,在您的第一个循环中,您正在更改您正在循环的变量,这很可能会导致错误。
For Each para In paras
Set para = para.getElementsByTagName("p")(i)
在上面,para
是你的循环变量。
此外,para.getElementsByTagName("p")
返回的集合将从 0
开始。
如果您索引到 getElementsByClassName
返回的初始集合,然后链接到 getElementsByTagName
,并将其用作 For Each
的集合(留下索引),那么您的代码将如何工作从 1 开始,然后您可以使用它写出正确的行;您可以使用循环变量 para
获取当前节点 innerText
):
Option Explicit
Public Sub TryKeywordSearch()
Dim http As Object, html As New HTMLDocument
Dim paras As Object, para As Object, i As Long
Set http = CreateObject("MSXML2.XMLHTTP")
http.Open "GET", "https://www.fool.com/earnings/call-transcripts/2019/07/17/netflix-inc-nflx-q2-2019-earnings-call-transcript.aspx", False
http.send
html.body.innerHTML = http.responseText
Set paras = html.getElementsByClassName("article-content")(0).getElementsByTagName("p")
i = 1
For Each para In paras
ThisWorkbook.Worksheets("Sheet1").Cells(i, 1).Value = para.innerText
i = i + 1
Next
End Sub
相反,您可以使用更快、更易读的 IMO css selector combination 来获取父级中的所有 p
标签 class article-content
:
Option Explicit
Public Sub GetParagraphs()
Dim http As Object, html As HTMLDocument, paragraphs As Object, i As Long
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.fool.com/earnings/call-transcripts/2019/07/17/netflix-inc-nflx-q2-2019-earnings-call-transcript.aspx", False
.send
html.body.innerHTML = .responseText
End With
Set paragraphs = html.querySelectorAll(".article-content p")
For i = 0 To paragraphs.Length - 1
ThisWorkbook.Worksheets("Sheet1").Cells(i + 1, 1) = paragraphs.item(i).innerText
Next i
End Sub