从 Morningstar 中提取特定 table 个单元格,然后循环到下一个 Morningstar 页面
Pulling specific table cells from Morningstar, then looping to next Morningstar page
我目前正在尝试从 Morningstar 的 table 中抓取某些数据,然后让它循环到下一个代码并重复直到没有更多代码。
目前,它将拉动尾随总计 Returns table 上的整个 "rank in category" 行。我只是想提取 3 个月、6 个月、年初至今、1 年、3 年和 5 年的数据。完成拉动后,它将循环到下一个自动收报机,由导航行中的 "Cells(p, 14)" 确定。
即。它检测到 "LINKX" 在单元格 1、14 中,因此它导航到 http://performance.morningstar.com/fund/performance-return.action?t=LINKX®ion=usa&culture=en_US 并从 "trailing total returns" table 中拉出所有 "Rank in Category" 行。我只想将指定的放入指定的单元格位置,然后循环到下一个自动收报机。
我浏览了很多这样的主题,使用 excel VBA 我试图从某个代码页面提取关键的特定信息,然后循环到下一个代码并重复。
Declare PtrSafe Function apiShowWindow Lib "user32" Alias "ShowWindow" _
(ByVal hwnd As LongPtr, ByVal nCmdShow As LongPtr) As LongPtr
Global Const SW_MAXIMIZE = 3
Global Const SW_SHOWNORMAL = 1
Global Const SW_SHOWMINIMIZED = 2
Sub LinkedInWebScrapeScript()
Dim objIE As InternetExplorer
Dim html As HTMLDocument
Set objIE = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
objIE.Visible = 1
Dim p As Integer
p = 3
objIE.navigate ("http://performance.morningstar.com/fund/performance-return.action?t=" & Cells(p, 14) & "®ion=usa&culture=en_US")
Application.Wait Now + #12:00:02 AM#
While objIE.Busy
DoEvents
Wend
apiShowWindow objIE.hwnd, SW_MAXIMIZE
For i = 1 To 2
objIE.document.parentWindow.scrollBy 0, 100000 & i
Application.Wait Now + #12:00:01 AM#
Next i
Dim TDelements As IHTMLElementCollection
Dim htmldoc As MSHTML.IHTMLDocument 'Document object
Dim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tags
Dim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tags
Dim eleColtd1 As MSHTML.IHTMLElementCollection
Dim eleRow As MSHTML.IHTMLElement 'Row elements
Dim eleCol As MSHTML.IHTMLElement 'Column elements
Set htmldoc = objIE.document 'Document webpage
Set eleColtr = htmldoc.getElementsByTagName("tr") 'Find all tr tags
Set TDelements = htmldoc.getElementsByTagName("table")
'This section populates Excel
i = 0 'start with first value in tr collection
Set eleColtd = htmldoc.getElementsByClassName("r_table3 width955px print97")(0).getElementsByClassName("last")(0).getElementsByClassName("row_data divide") 'get all the td elements in that specific tr
For Each eleCol In eleColtd 'for each element in the td collection
Sheets("Sheet2").Range("A1").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
j = j + 1 'move to next element in td collection
Next eleCol 'rinse and repeat
i = i + 1
p = p + 1
objIE.navigate ("http://performance.morningstar.com/fund/performance-return.action?t=" & Cells(p, 14) & "®ion=usa&culture=en_US")
Set eleColtd = htmldoc.getElementsByClassName("r_table3 width955px print97")(0).getElementsByClassName("last")(0).getElementsByClassName("row_data divide") 'get all the td elements in that specific tr
For Each eleCol In eleColtd 'for each element in the td collection
Sheets("Sheet2").Range("A1").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
z = z + 1
j = j + 1 'move to next element in td collection
Next eleCol 'rinse and repeat
End Sub
它将拉动尾随总计 Returns table 上的整个 "rank in category" 行。我只是想提取 3 个月、6 个月、年初至今、1 年、3 年和 5 年的数据。完成拉动后,它将循环到下一个自动收报机,由导航行中的 "Cells(p, 14)" 确定。
下面显示了一个循环以及如何 select 适当的 table,tbody 然后 table 使用 css selectors 的单元格。代码从第 1 行开始的第 N 列读入数组。它假设范围内没有空白单元格(尽管您可以添加一个测试来确定)。
数组有一个循环,其中包含每个代码,url 中的 TICKER 占位符被替换为当前代码值。
每月显示选项卡上有一行可以点击
适当的行通过
识别
Set rankings = .querySelectorAll("#tab-month-end-content .last td")
#tab-month-end-content
是一个 id selector 得到正确的选项卡,然后 .last
是 class selector 对于 class 最后一个 tbody
(即 last
)的名称,然后 td
用于指定该 tbody 中的子 td
单元格。
CSS select或:
现代浏览器针对 css 进行了优化。 Css selector 是匹配 html 文档中元素的快速方法。 Css select 或通过 querySelector 或 querySelectorAll
方法应用;在这种情况下,HTMLDocument
(ie.document)。 querySelector
returns单节点:第一个匹配为cssselector; querySelectorAll
returns 所有匹配项目的节点列表 - 然后您索引到该节点列表以获取特定项目,例如第二个 td 单元格位于索引 1。
查看我们指定的模式:
#tab-month-end-content .last td
第一部分是一个 id selector、#
,其中 select 是一个 id
的元素
#tab-month-end-content
当应用到页面时,这 returns 两个匹配,我们想要第二个
点击图片放大
下一部分
.last
是一个class selector, .
, for class name last
. This selects the tbody
tag child element shown in the image above. As only the second id matched element has this child we are now working with the right parent element to go on and select the td
type elements using type selector
td
上述每个部分之间的空格</code>称为<a href="https://developer.mozilla.org/en-US/docs/Web/CSS/Descendant_combinator" rel="nofollow noreferrer">descendant combinators</a>,它们指定与第二个select匹配的元素或select 如果它们有一个与第一个 select 匹配的祖先元素,或者即左边的 selector 是相邻检索到的 selector 匹配元素的父元素,则 selected css select或向右。</p>
<p>我们可以在下一张图片中看到这一点:</p>
<p><sub><em>点击图片放大</em></sub></p>
<p><a href="https://i.stack.imgur.com/mRaOJ.png" rel="nofollow noreferrer"><WBIMG:11861638-2.png></a></p>
<hr>
<p><strong>VBA:</strong></p>
<pre><code>Option Explicit
Public Sub GetData()
Dim ie As Object, tickers(), ws As Worksheet, lastRow As Long
Dim results(), headers(), r As Long, i As Long, url As String
headers = Array("ticker", "3m", "6m", "ytd", "1y", "3y", "6y")
Set ws = ThisWorkbook.Worksheets("Sheet1")
tickers = Application.Transpose(ws.Range("N1:N" & GetLastRow(ws, 14)).Value)
ReDim results(1 To UBound(tickers), 1 To UBound(headers) + 1)
Set ie = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
With ie
.Visible = True
For i = LBound(tickers) To UBound(tickers)
r = r + 1
url = Replace$("http://performance.morningstar.com/fund/performance-return.action?t=TICKER®ion=usa&culture=en_US", "TICKER", tickers(i))
.Navigate2 url
While .Busy Or .readyState < 4: DoEvents: Wend
.document.querySelector("[tabname='#tabmonth']").Click
Dim rankings As Object
Do
Loop While .document.querySelectorAll("#tab-month-end-content .last td").Length = 0 'could add timed loop here
With .document
Set rankings = .querySelectorAll("#tab-month-end-content .last td")
On Error Resume Next
results(r, 1) = tickers(i)
results(r, 2) = rankings.item(1).innerText
results(r, 3) = rankings.item(2).innerText
results(r, 4) = rankings.item(3).innerText
results(r, 5) = rankings.item(4).innerText
results(r, 6) = rankings.item(5).innerText
results(r, 7) = rankings.item(6).innerText
On Error GoTo 0
End With
Set rankings = Nothing
Next
ws.Cells(1, 15).Resize(UBound(results, 1), UBound(results, 2)) = results
.Quit
End With
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.rows.Count, columnNumber).End(xlUp).Row
End With
End Function
如@SIM 所述,您可以使用 xmlhttp 并避免使用浏览器,但不确定您的安全设置是否需要将网站列入白名单。您将需要在此处 url 中探索占位符是否有效:XNAS:TICKER
。 XNAS
前缀可能因您的代码而异,在这种情况下,您需要适当的字符串,包括 N 列中的前缀,然后用该字符串替换扩展占位符,例如.....=PLACEHOLDER®ion
.......
Option Explicit
Public Sub GetData()
Dim tickers(), ws As Worksheet, lastRow As Long
Dim results(), headers(), r As Long, i As Long, url As String, html As HTMLDocument
Set html = New HTMLDocument 'vbe > tools > references > Microsoft HTML Object Library
headers = Array("ticker", "3m", "6m", "ytd", "1y", "3y", "6y")
Set ws = ThisWorkbook.Worksheets("Sheet1")
tickers = Application.Transpose(ws.Range("N1:N" & GetLastRow(ws, 14)).Value)
ReDim results(1 To UBound(tickers), 1 To UBound(headers) + 1)
With CreateObject("MSXML2.XMLHTTP")
For i = LBound(tickers) To UBound(tickers)
r = r + 1
url = Replace$("http://performance.morningstar.com/perform/Performance/fund/trailing-total-returns.action?&t=XNAS:TICKER®ion=usa&culture=en-US&cur=&ops=clear&s=0P0000J533&ndec=2&ep=true&align=m&annlz=true&comparisonRemove=false&loccat=&taxadj=&benchmarkSecId=&benchmarktype=", "TICKER", tickers(i))
.Open "GET", url, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.setRequestHeader "DNT", "1"
.send
html.body.innerHTML = .responseText
Dim rankings As Object
With html
Set rankings = .querySelectorAll(".last td")
On Error Resume Next
results(r, 1) = tickers(i)
results(r, 2) = rankings.item(1).innerText
results(r, 3) = rankings.item(2).innerText
results(r, 4) = rankings.item(3).innerText
results(r, 5) = rankings.item(4).innerText
results(r, 6) = rankings.item(5).innerText
results(r, 7) = rankings.item(6).innerText
On Error GoTo 0
End With
Set rankings = Nothing
Next
ws.Cells(1, 15).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.rows.Count, columnNumber).End(xlUp).Row
End With
End Function
我目前正在尝试从 Morningstar 的 table 中抓取某些数据,然后让它循环到下一个代码并重复直到没有更多代码。
目前,它将拉动尾随总计 Returns table 上的整个 "rank in category" 行。我只是想提取 3 个月、6 个月、年初至今、1 年、3 年和 5 年的数据。完成拉动后,它将循环到下一个自动收报机,由导航行中的 "Cells(p, 14)" 确定。
即。它检测到 "LINKX" 在单元格 1、14 中,因此它导航到 http://performance.morningstar.com/fund/performance-return.action?t=LINKX®ion=usa&culture=en_US 并从 "trailing total returns" table 中拉出所有 "Rank in Category" 行。我只想将指定的放入指定的单元格位置,然后循环到下一个自动收报机。
我浏览了很多这样的主题,使用 excel VBA 我试图从某个代码页面提取关键的特定信息,然后循环到下一个代码并重复。
Declare PtrSafe Function apiShowWindow Lib "user32" Alias "ShowWindow" _
(ByVal hwnd As LongPtr, ByVal nCmdShow As LongPtr) As LongPtr
Global Const SW_MAXIMIZE = 3
Global Const SW_SHOWNORMAL = 1
Global Const SW_SHOWMINIMIZED = 2
Sub LinkedInWebScrapeScript()
Dim objIE As InternetExplorer
Dim html As HTMLDocument
Set objIE = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
objIE.Visible = 1
Dim p As Integer
p = 3
objIE.navigate ("http://performance.morningstar.com/fund/performance-return.action?t=" & Cells(p, 14) & "®ion=usa&culture=en_US")
Application.Wait Now + #12:00:02 AM#
While objIE.Busy
DoEvents
Wend
apiShowWindow objIE.hwnd, SW_MAXIMIZE
For i = 1 To 2
objIE.document.parentWindow.scrollBy 0, 100000 & i
Application.Wait Now + #12:00:01 AM#
Next i
Dim TDelements As IHTMLElementCollection
Dim htmldoc As MSHTML.IHTMLDocument 'Document object
Dim eleColtr As MSHTML.IHTMLElementCollection 'Element collection for tr tags
Dim eleColtd As MSHTML.IHTMLElementCollection 'Element collection for td tags
Dim eleColtd1 As MSHTML.IHTMLElementCollection
Dim eleRow As MSHTML.IHTMLElement 'Row elements
Dim eleCol As MSHTML.IHTMLElement 'Column elements
Set htmldoc = objIE.document 'Document webpage
Set eleColtr = htmldoc.getElementsByTagName("tr") 'Find all tr tags
Set TDelements = htmldoc.getElementsByTagName("table")
'This section populates Excel
i = 0 'start with first value in tr collection
Set eleColtd = htmldoc.getElementsByClassName("r_table3 width955px print97")(0).getElementsByClassName("last")(0).getElementsByClassName("row_data divide") 'get all the td elements in that specific tr
For Each eleCol In eleColtd 'for each element in the td collection
Sheets("Sheet2").Range("A1").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
j = j + 1 'move to next element in td collection
Next eleCol 'rinse and repeat
i = i + 1
p = p + 1
objIE.navigate ("http://performance.morningstar.com/fund/performance-return.action?t=" & Cells(p, 14) & "®ion=usa&culture=en_US")
Set eleColtd = htmldoc.getElementsByClassName("r_table3 width955px print97")(0).getElementsByClassName("last")(0).getElementsByClassName("row_data divide") 'get all the td elements in that specific tr
For Each eleCol In eleColtd 'for each element in the td collection
Sheets("Sheet2").Range("A1").Offset(i, j).Value = eleCol.innerText 'paste the inner text of the td element, and offset at the same time
z = z + 1
j = j + 1 'move to next element in td collection
Next eleCol 'rinse and repeat
End Sub
它将拉动尾随总计 Returns table 上的整个 "rank in category" 行。我只是想提取 3 个月、6 个月、年初至今、1 年、3 年和 5 年的数据。完成拉动后,它将循环到下一个自动收报机,由导航行中的 "Cells(p, 14)" 确定。
下面显示了一个循环以及如何 select 适当的 table,tbody 然后 table 使用 css selectors 的单元格。代码从第 1 行开始的第 N 列读入数组。它假设范围内没有空白单元格(尽管您可以添加一个测试来确定)。
数组有一个循环,其中包含每个代码,url 中的 TICKER 占位符被替换为当前代码值。
每月显示选项卡上有一行可以点击
适当的行通过
识别Set rankings = .querySelectorAll("#tab-month-end-content .last td")
#tab-month-end-content
是一个 id selector 得到正确的选项卡,然后 .last
是 class selector 对于 class 最后一个 tbody
(即 last
)的名称,然后 td
用于指定该 tbody 中的子 td
单元格。
CSS select或:
现代浏览器针对 css 进行了优化。 Css selector 是匹配 html 文档中元素的快速方法。 Css select 或通过 querySelector 或 querySelectorAll
方法应用;在这种情况下,HTMLDocument
(ie.document)。 querySelector
returns单节点:第一个匹配为cssselector; querySelectorAll
returns 所有匹配项目的节点列表 - 然后您索引到该节点列表以获取特定项目,例如第二个 td 单元格位于索引 1。
查看我们指定的模式:
#tab-month-end-content .last td
第一部分是一个 id selector、#
,其中 select 是一个 id
#tab-month-end-content
当应用到页面时,这 returns 两个匹配,我们想要第二个
点击图片放大
下一部分
.last
是一个class selector, .
, for class name last
. This selects the tbody
tag child element shown in the image above. As only the second id matched element has this child we are now working with the right parent element to go on and select the td
type elements using type selector
td
上述每个部分之间的空格</code>称为<a href="https://developer.mozilla.org/en-US/docs/Web/CSS/Descendant_combinator" rel="nofollow noreferrer">descendant combinators</a>,它们指定与第二个select匹配的元素或select 如果它们有一个与第一个 select 匹配的祖先元素,或者即左边的 selector 是相邻检索到的 selector 匹配元素的父元素,则 selected css select或向右。</p>
<p>我们可以在下一张图片中看到这一点:</p>
<p><sub><em>点击图片放大</em></sub></p>
<p><a href="https://i.stack.imgur.com/mRaOJ.png" rel="nofollow noreferrer"><WBIMG:11861638-2.png></a></p>
<hr>
<p><strong>VBA:</strong></p>
<pre><code>Option Explicit
Public Sub GetData()
Dim ie As Object, tickers(), ws As Worksheet, lastRow As Long
Dim results(), headers(), r As Long, i As Long, url As String
headers = Array("ticker", "3m", "6m", "ytd", "1y", "3y", "6y")
Set ws = ThisWorkbook.Worksheets("Sheet1")
tickers = Application.Transpose(ws.Range("N1:N" & GetLastRow(ws, 14)).Value)
ReDim results(1 To UBound(tickers), 1 To UBound(headers) + 1)
Set ie = GetObject("new:{D5E8041D-920F-45e9-B8FB-B1DEB82C6E5E}")
With ie
.Visible = True
For i = LBound(tickers) To UBound(tickers)
r = r + 1
url = Replace$("http://performance.morningstar.com/fund/performance-return.action?t=TICKER®ion=usa&culture=en_US", "TICKER", tickers(i))
.Navigate2 url
While .Busy Or .readyState < 4: DoEvents: Wend
.document.querySelector("[tabname='#tabmonth']").Click
Dim rankings As Object
Do
Loop While .document.querySelectorAll("#tab-month-end-content .last td").Length = 0 'could add timed loop here
With .document
Set rankings = .querySelectorAll("#tab-month-end-content .last td")
On Error Resume Next
results(r, 1) = tickers(i)
results(r, 2) = rankings.item(1).innerText
results(r, 3) = rankings.item(2).innerText
results(r, 4) = rankings.item(3).innerText
results(r, 5) = rankings.item(4).innerText
results(r, 6) = rankings.item(5).innerText
results(r, 7) = rankings.item(6).innerText
On Error GoTo 0
End With
Set rankings = Nothing
Next
ws.Cells(1, 15).Resize(UBound(results, 1), UBound(results, 2)) = results
.Quit
End With
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.rows.Count, columnNumber).End(xlUp).Row
End With
End Function
如@SIM 所述,您可以使用 xmlhttp 并避免使用浏览器,但不确定您的安全设置是否需要将网站列入白名单。您将需要在此处 url 中探索占位符是否有效:XNAS:TICKER
。 XNAS
前缀可能因您的代码而异,在这种情况下,您需要适当的字符串,包括 N 列中的前缀,然后用该字符串替换扩展占位符,例如.....=PLACEHOLDER®ion
.......
Option Explicit
Public Sub GetData()
Dim tickers(), ws As Worksheet, lastRow As Long
Dim results(), headers(), r As Long, i As Long, url As String, html As HTMLDocument
Set html = New HTMLDocument 'vbe > tools > references > Microsoft HTML Object Library
headers = Array("ticker", "3m", "6m", "ytd", "1y", "3y", "6y")
Set ws = ThisWorkbook.Worksheets("Sheet1")
tickers = Application.Transpose(ws.Range("N1:N" & GetLastRow(ws, 14)).Value)
ReDim results(1 To UBound(tickers), 1 To UBound(headers) + 1)
With CreateObject("MSXML2.XMLHTTP")
For i = LBound(tickers) To UBound(tickers)
r = r + 1
url = Replace$("http://performance.morningstar.com/perform/Performance/fund/trailing-total-returns.action?&t=XNAS:TICKER®ion=usa&culture=en-US&cur=&ops=clear&s=0P0000J533&ndec=2&ep=true&align=m&annlz=true&comparisonRemove=false&loccat=&taxadj=&benchmarkSecId=&benchmarktype=", "TICKER", tickers(i))
.Open "GET", url, False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.setRequestHeader "DNT", "1"
.send
html.body.innerHTML = .responseText
Dim rankings As Object
With html
Set rankings = .querySelectorAll(".last td")
On Error Resume Next
results(r, 1) = tickers(i)
results(r, 2) = rankings.item(1).innerText
results(r, 3) = rankings.item(2).innerText
results(r, 4) = rankings.item(3).innerText
results(r, 5) = rankings.item(4).innerText
results(r, 6) = rankings.item(5).innerText
results(r, 7) = rankings.item(6).innerText
On Error GoTo 0
End With
Set rankings = Nothing
Next
ws.Cells(1, 15).Resize(UBound(results, 1), UBound(results, 2)) = results
End With
End Sub
Public Function GetLastRow(ByVal ws As Worksheet, Optional ByVal columnNumber As Long = 1) As Long
With ws
GetLastRow = .Cells(.rows.Count, columnNumber).End(xlUp).Row
End With
End Function