如何在 Excel/Google 表格中抓取网站?
How to Web Scrape the Site in Excel/Google Sheets?
我应该如何抓取此网页 https://www.bseindia.com/stock-share-price/asian-paints-ltd/asianpaint/500820/ 并且特别需要 table 中提到的 ROE 数据?
我在Excel中使用了以下代码。我不太了解 Google Sheets Scraping
Sub FetchData()
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;https://www.bseindia.com/stock-share-price/asian-paints-ltd/asianpaint/500820/", Destination:=Range( _
"$A"))
.Name = "www"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
End Sub
我无法正确获取数据。
关于这个的 suggestions/help 吗?需要 ROE 图 其余的不需要。
以下是我发现更容易获得该特定值的方法。一旦 for loop
检测到 ROE
,它将追踪所需的值并退出循环,因为它们都在同一个父节点中。
Sub FetchData()
Dim IE As New InternetExplorer, post As Object
Dim Html As HTMLDocument, elem As Object
With IE
.Visible = False
.navigate "https://www.bseindia.com/stock-share-price/asian-paints-ltd/asianpaint/500820/"
While .Busy Or .readyState < 4: DoEvents: Wend
Set Html = .document
End With
For Each post In Html.getElementsByTagName("td")
If post.innerText = "ROE" Then
Set elem = post.ParentNode.querySelector(".textvalue")
Exit For
End If
Next post
[A1] = elem.innerText
End Sub
要添加的参考文献:
Microsoft Html Object Library
Microsoft Internet Controls
不幸的是,这是不可能的,因为该站点由 JavaScript 控制,而 Google Sheets 不能 understand/import JS。您可以简单地通过为给定的 link 禁用 JS 来测试它,您将看到一个空白页面:
所见即所得:
=ARRAY_CONSTRAIN(IMPORTDATA("https://www.bseindia.com/stock-share-price/asian-paints-ltd/asianpaint/500820/"), 5000, 15)
使用API 页面使用起来会快很多。您可以使用 powerquery 来处理 json 响应、json 解析器或仅使用 split。如果您想在按下按钮时刷新,请将代码放入标准模块并 link 放入按钮。
Option Explicit
Public Sub GetInfo()
Dim s As String, ids(), i As Long
ids = Array(500820, 500312, 500325, 532540)
With CreateObject("MSXML2.XMLHTTP")
For i = LBound(ids) To UBound(ids)
.Open "GET", "https://api.bseindia.com/BseIndiaAPI/api/ComHeader/w?quotetype=EQ&scripcode=" & ids(i) & "&seriesid=", False
.send
s = .responseText
ActiveSheet.Cells(i + 1, 1) = Split(Split(s, """ROE"":""")(1), Chr$(34))(0)
Next
End With
End Sub
我应该如何抓取此网页 https://www.bseindia.com/stock-share-price/asian-paints-ltd/asianpaint/500820/ 并且特别需要 table 中提到的 ROE 数据?
我在Excel中使用了以下代码。我不太了解 Google Sheets Scraping
Sub FetchData()
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;https://www.bseindia.com/stock-share-price/asian-paints-ltd/asianpaint/500820/", Destination:=Range( _
"$A"))
.Name = "www"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
End Sub
我无法正确获取数据。
关于这个的 suggestions/help 吗?需要 ROE 图 其余的不需要。
以下是我发现更容易获得该特定值的方法。一旦 for loop
检测到 ROE
,它将追踪所需的值并退出循环,因为它们都在同一个父节点中。
Sub FetchData()
Dim IE As New InternetExplorer, post As Object
Dim Html As HTMLDocument, elem As Object
With IE
.Visible = False
.navigate "https://www.bseindia.com/stock-share-price/asian-paints-ltd/asianpaint/500820/"
While .Busy Or .readyState < 4: DoEvents: Wend
Set Html = .document
End With
For Each post In Html.getElementsByTagName("td")
If post.innerText = "ROE" Then
Set elem = post.ParentNode.querySelector(".textvalue")
Exit For
End If
Next post
[A1] = elem.innerText
End Sub
要添加的参考文献:
Microsoft Html Object Library
Microsoft Internet Controls
不幸的是,这是不可能的,因为该站点由 JavaScript 控制,而 Google Sheets 不能 understand/import JS。您可以简单地通过为给定的 link 禁用 JS 来测试它,您将看到一个空白页面:
所见即所得:
=ARRAY_CONSTRAIN(IMPORTDATA("https://www.bseindia.com/stock-share-price/asian-paints-ltd/asianpaint/500820/"), 5000, 15)
使用API 页面使用起来会快很多。您可以使用 powerquery 来处理 json 响应、json 解析器或仅使用 split。如果您想在按下按钮时刷新,请将代码放入标准模块并 link 放入按钮。
Option Explicit
Public Sub GetInfo()
Dim s As String, ids(), i As Long
ids = Array(500820, 500312, 500325, 532540)
With CreateObject("MSXML2.XMLHTTP")
For i = LBound(ids) To UBound(ids)
.Open "GET", "https://api.bseindia.com/BseIndiaAPI/api/ComHeader/w?quotetype=EQ&scripcode=" & ids(i) & "&seriesid=", False
.send
s = .responseText
ActiveSheet.Cells(i + 1, 1) = Split(Split(s, """ROE"":""")(1), Chr$(34))(0)
Next
End With
End Sub