如何从网站提取数据并使用 VBA 填充 excel sheet?
How can I extract data from a website and fill an excel sheet using VBA?
我想从 betexplorer.com 中提取数据。我想从以下 URL:
中提取两个不同的数据
https://www.betexplorer.com/soccer/s...eague-1/stats/
我想提取已进行的比赛和剩余的比赛
我想提取主场进球和客场进球(每场比赛)
我有执行此操作的代码,如下所示:
Option Explicit
Sub GetSoccerStats()
'Set a reference (VBE > Tools > References) to the following libraries:
' 1) Microsoft XML, v6.0
' 2) Microsoft HTML Object Library
Dim xmlReq As New MSXML2.XMLHTTP60
Dim objDoc As New MSHTML.HTMLDocument
Dim objTable As MSHTML.htmlTable
Dim objTableRow As MSHTML.htmlTableRow
Dim strURL As String
Dim strResp As String
Dim strText As String
Dim rw As Long
strURL = "https://www.betexplorer.com/soccer/south-korea/k-league-1/stats/"
With xmlReq
.Open "GET", strURL, False
.send
If .Status <> 200 Then
MsgBox "Error " & .Status & ": " & .statusText
Exit Sub
End If
strResp = .responseText
End With
Worksheets.Add
objDoc.body.innerHTML = strResp
Set objTable = objDoc.getElementsByClassName("table-main leaguestats")(0)
If Not objTable Is Nothing Then
rw = 1
For Each objTableRow In objTable.Rows
strText = objTableRow.Cells(0).innerText
Select Case strText
Case "Matches played", "Matches remaining", "Home goals", "Away goals"
Cells(rw, "a").Value = objTableRow.Cells(0).innerText
Cells(rw, "b").Value = objTableRow.Cells(1).innerText
Cells(rw, "c").Value = objTableRow.Cells(2).innerText
rw = rw + 1
End Select
Next objTableRow
Columns("a").AutoFit
End If
Set xmlReq = Nothing
Set objDoc = Nothing
Set objTable = Nothing
Set objTableRow = Nothing
End Sub
此代码有效,但我想更进一步。
我实际上想 运行 这个宏用于同一站点上的许多不同 URL。我已经创建了一个作品sheet,其中包含足球联赛列表(在行中),列包含数据。
您可以在此处找到该文件:
https://www.dropbox.com/s/77sol24sty75w5z/Avg%20Goals.xlsm?dl=0
这是一个文件,我将在其中向行添加联赛。是否可以调整提取数据的代码,以便它可以填充我的 sheet 中的列?我不需要像这段代码那样提取数据名称(剩余比赛、主场进球、客场进球等),我只需要数字。提取的数字必须按照 sheet 填充列(因此每一行都包含每个联赛的数据。如您所见,有几个联赛,因此需要遍历每一行然后使用该行对应的 URL。
您会注意到有一列包含单词 CURRENT。这是为了指示它应该使用当前 URL 列中的 URL。如果我将值更改为 LAST,我希望它在 Last URL 列中使用 URL。
对于每个联赛,如果我使用 CURRENT 或 LAST,它会有所不同。
这是预期输出的图片:
非常感谢任何帮助。
与您的代码保持一致,这将在 M:T 列中输出这些项目的数据。我有一个辅助函数 GetLinks
,它根据列 K:
中的值生成要使用的最终 url 数组
inputArray = GetLinks(inputArray)
循环此数组并发出 xhr 请求以获取信息。所有的结果信息都存储在一个数组中,results
,一次性写出到最后的sheet
我自始至终都使用数组,因为你不想继续阅读 sheet;这是一项昂贵的操作,会减慢您的代码速度。出于同样的原因,如果 <> 200 发生,我会立即打印 window 消息和 url 以免减慢代码速度。你实际上有一个日志,然后你可以在最后查看。
检索到的结果是从M列写出的,但是由于数据是数组形式的,所以你可以很容易地写到任何你想写的地方;只需将用于粘贴的起始单元格从 M4
更改为您想要的最左上角的单元格。您现有的列中没有百分比,因此我可以放心地假设您希望写出的数据位于新列中(甚至可能位于不同的 sheet)。
Option Explicit
Public Sub GetSoccerStats()
Dim xmlReq As New MSXML2.XMLHTTP60, response As String
Dim objDoc As New MSHTML.HTMLDocument, text As String
Dim lastRow As Long, dataSheet As Worksheet, inputArray(), i As Long
Set dataSheet = ThisWorkbook.Worksheets("AVG GOAL DATA")
With dataSheet
lastRow = .Cells(.Rows.Count, "B").End(xlUp).Row
End With
inputArray = dataSheet.Range("J4:L" & lastRow).Value
inputArray = GetLinks(inputArray)
Dim results(), r As Long, c As Long
ReDim results(1 To UBound(inputArray, 1), 1 To 8)
With xmlReq
For i = LBound(inputArray, 1) To UBound(inputArray, 1)
r = r + 1
.Open "GET", inputArray(i, 4), False
.send
If .Status <> 200 Then
Debug.Print inputArray(i, 4), vbTab, "Error " & .Status & ": " & .statusText
Else
response = .responseText
objDoc.body.innerHTML = response
Dim objTable As MSHTML.HTMLTable, objTableRow As MSHTML.HTMLTableRow
Set objTable = objDoc.getElementsByClassName("table-main leaguestats")(0)
If Not objTable Is Nothing Then
c = 1
For Each objTableRow In objTable.Rows
text = objTableRow.Cells(0).innerText
Select Case text
Case "Matches played", "Matches remaining", "Home goals", "Away goals"
results(r, c) = objTableRow.Cells(1).innerText
results(r, c + 1) = objTableRow.Cells(2).innerText
c = c + 2
End Select
Next objTableRow
End If
End If
Set objTable = Nothing
Next
End With
dataSheet.Range("M4").Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub
Public Function GetLinks(ByRef inputArray As Variant) As Variant
Dim i As Long
ReDim Preserve inputArray(1 To UBound(inputArray, 1), 1 To UBound(inputArray, 2) + 1)
For i = LBound(inputArray, 1) To UBound(inputArray, 1)
inputArray(i, 4) = IIf(inputArray(i, 1) = "CURRENT", inputArray(i, 2), inputArray(i, 3))
Next
GetLinks = inputArray
End Function
文件布局:
由于请求过多导致阻塞这里是IE版本:
'VBE > Tools > References:
'1: Microsoft HTML Object library 2: Microsoft Internet Controls
Public Sub GetSoccerStats()
Dim ie As Object, t As Date
Dim objDoc As New MSHTML.HTMLDocument, text As String
Dim lastRow As Long, dataSheet As Worksheet, inputArray(), i As Long
Const MAX_WAIT_SEC As Long = 10
Set dataSheet = ThisWorkbook.Worksheets("AVG GOAL DATA")
Set ie = CreateObject("InternetExplorer.Application")
With dataSheet
lastRow = .Cells(.Rows.Count, "B").End(xlUp).Row
End With
inputArray = dataSheet.Range("C4:E" & lastRow).Value
inputArray = GetLinks(inputArray)
Dim results(), r As Long, c As Long
ReDim results(1 To UBound(inputArray, 1), 1 To 8)
With ie
.Visible = True
For i = LBound(inputArray, 1) To UBound(inputArray, 1)
r = r + 1
.navigate2 inputArray(i, 4)
While .Busy Or .readyState < 4: DoEvents: Wend
Dim objTable As MSHTML.HTMLTable, objTableRow As MSHTML.HTMLTableRow
t = timer
Do
DoEvents
On Error Resume Next
Set objTable = .document.getElementsByClassName("table-main leaguestats")(0)
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While objTable Is Nothing
If Not objTable Is Nothing Then
c = 1
For Each objTableRow In objTable.Rows
text = objTableRow.Cells(0).innerText
Select Case text
Case "Matches played", "Matches remaining", "Home goals", "Away goals"
results(r, c) = objTableRow.Cells(1).innerText
results(r, c + 1) = objTableRow.Cells(2).innerText
c = c + 2
End Select
Next objTableRow
End If
Set objTable = Nothing
Next
.Quit
End With
dataSheet.Range("F4").Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub
也许这样的事情可能会奏效:
Option Explicit
Private Sub GetSoccerStats()
'Set a reference (VBE > Tools > References) to the following libraries:
' 1) Microsoft XML, v6.0
' 2) Microsoft HTML Object Library
Dim sourceSheet As Worksheet
Set sourceSheet = ThisWorkbook.Worksheets("AVG GOAL DATA")
Dim firstRowToFetchDataFor As Long
firstRowToFetchDataFor = sourceSheet.Cells(sourceSheet.Rows.Count, "C").End(xlUp).Row + 1 ' Assumes a row needs pulling if the value in column C is blank.
Dim lastRowToFetchDataFor As Long
lastRowToFetchDataFor = sourceSheet.Cells(sourceSheet.Rows.Count, "B").End(xlUp).Row
Dim xmlReq As MSXML2.XMLHTTP60
Set xmlReq = New MSXML2.XMLHTTP60
Dim htmlDoc As MSHTML.HTMLDocument
Set htmlDoc = New MSHTML.HTMLDocument
Dim rowIndex As Long
For rowIndex = firstRowToFetchDataFor To lastRowToFetchDataFor
Dim URL As String
Select Case LCase$(sourceSheet.Cells(rowIndex, "J"))
Case "current"
URL = sourceSheet.Cells(rowIndex, "K")
Case "last"
URL = sourceSheet.Cells(rowIndex, "L")
Case Else
MsgBox "Expected 'current' or 'last', instead got '" & sourceSheet.Cells(rowIndex, "J") & "' in cell '" & sourceSheet.Cells(rowIndex, "J").Address(False, False) & "'.", vbCritical
Application.Goto sourceSheet.Cells(rowIndex, "J")
Exit Sub
End Select
With xmlReq
.Open "GET", URL, False
.send
If .Status <> 200 Then
MsgBox "Request returned HTTP " & .Status & ":" & vbNewLine & vbNewLine & .statusText, vbCritical
Exit Sub
End If
htmlDoc.body.innerHTML = .responseText
End With
Dim htmlTableExtracted As MSHTML.HTMLTable
On Error Resume Next
Set htmlTableExtracted = htmlDoc.getElementsByClassName("table-main leaguestats")(0)
On Error GoTo 0
If Not (htmlTableExtracted Is Nothing) Then
Dim tableRow As MSHTML.HTMLTableRow
For Each tableRow In htmlTableExtracted.Rows
Select Case LCase$(tableRow.Cells(0).innerText)
Case "matches played"
sourceSheet.Cells(rowIndex, "G") = tableRow.Cells(1).innerText
Case "matches remaining"
sourceSheet.Cells(rowIndex, "H") = tableRow.Cells(1).innerText
Case "home goals"
sourceSheet.Cells(rowIndex, "C") = tableRow.Cells(2).innerText
Case "away goals"
sourceSheet.Cells(rowIndex, "E") = tableRow.Cells(2).innerText
End Select
Next tableRow
Set htmlTableExtracted = Nothing ' Prevent this iteration's result having effects on succeeding iterations
End If
Next rowIndex
End Sub
我可能错了,但 E
列不应该包含 "away goals" 吗?我假设 "A SCR AVG" 中的 "A" 代表 "Away"(因为 "H SCR AVG" 中的 "H" 似乎代表 "Home")。所以我将 "Away goals" 写到 E
列,尽管屏幕截图表明它们应该写到 B
列(或者我没有正确阅读)。
我想从 betexplorer.com 中提取数据。我想从以下 URL:
中提取两个不同的数据https://www.betexplorer.com/soccer/s...eague-1/stats/
我想提取已进行的比赛和剩余的比赛 我想提取主场进球和客场进球(每场比赛)
我有执行此操作的代码,如下所示:
Option Explicit
Sub GetSoccerStats()
'Set a reference (VBE > Tools > References) to the following libraries:
' 1) Microsoft XML, v6.0
' 2) Microsoft HTML Object Library
Dim xmlReq As New MSXML2.XMLHTTP60
Dim objDoc As New MSHTML.HTMLDocument
Dim objTable As MSHTML.htmlTable
Dim objTableRow As MSHTML.htmlTableRow
Dim strURL As String
Dim strResp As String
Dim strText As String
Dim rw As Long
strURL = "https://www.betexplorer.com/soccer/south-korea/k-league-1/stats/"
With xmlReq
.Open "GET", strURL, False
.send
If .Status <> 200 Then
MsgBox "Error " & .Status & ": " & .statusText
Exit Sub
End If
strResp = .responseText
End With
Worksheets.Add
objDoc.body.innerHTML = strResp
Set objTable = objDoc.getElementsByClassName("table-main leaguestats")(0)
If Not objTable Is Nothing Then
rw = 1
For Each objTableRow In objTable.Rows
strText = objTableRow.Cells(0).innerText
Select Case strText
Case "Matches played", "Matches remaining", "Home goals", "Away goals"
Cells(rw, "a").Value = objTableRow.Cells(0).innerText
Cells(rw, "b").Value = objTableRow.Cells(1).innerText
Cells(rw, "c").Value = objTableRow.Cells(2).innerText
rw = rw + 1
End Select
Next objTableRow
Columns("a").AutoFit
End If
Set xmlReq = Nothing
Set objDoc = Nothing
Set objTable = Nothing
Set objTableRow = Nothing
End Sub
此代码有效,但我想更进一步。
我实际上想 运行 这个宏用于同一站点上的许多不同 URL。我已经创建了一个作品sheet,其中包含足球联赛列表(在行中),列包含数据。
您可以在此处找到该文件: https://www.dropbox.com/s/77sol24sty75w5z/Avg%20Goals.xlsm?dl=0
这是一个文件,我将在其中向行添加联赛。是否可以调整提取数据的代码,以便它可以填充我的 sheet 中的列?我不需要像这段代码那样提取数据名称(剩余比赛、主场进球、客场进球等),我只需要数字。提取的数字必须按照 sheet 填充列(因此每一行都包含每个联赛的数据。如您所见,有几个联赛,因此需要遍历每一行然后使用该行对应的 URL。
您会注意到有一列包含单词 CURRENT。这是为了指示它应该使用当前 URL 列中的 URL。如果我将值更改为 LAST,我希望它在 Last URL 列中使用 URL。
对于每个联赛,如果我使用 CURRENT 或 LAST,它会有所不同。
这是预期输出的图片:
非常感谢任何帮助。
与您的代码保持一致,这将在 M:T 列中输出这些项目的数据。我有一个辅助函数 GetLinks
,它根据列 K:
inputArray = GetLinks(inputArray)
循环此数组并发出 xhr 请求以获取信息。所有的结果信息都存储在一个数组中,results
,一次性写出到最后的sheet
我自始至终都使用数组,因为你不想继续阅读 sheet;这是一项昂贵的操作,会减慢您的代码速度。出于同样的原因,如果 <> 200 发生,我会立即打印 window 消息和 url 以免减慢代码速度。你实际上有一个日志,然后你可以在最后查看。
检索到的结果是从M列写出的,但是由于数据是数组形式的,所以你可以很容易地写到任何你想写的地方;只需将用于粘贴的起始单元格从 M4
更改为您想要的最左上角的单元格。您现有的列中没有百分比,因此我可以放心地假设您希望写出的数据位于新列中(甚至可能位于不同的 sheet)。
Option Explicit
Public Sub GetSoccerStats()
Dim xmlReq As New MSXML2.XMLHTTP60, response As String
Dim objDoc As New MSHTML.HTMLDocument, text As String
Dim lastRow As Long, dataSheet As Worksheet, inputArray(), i As Long
Set dataSheet = ThisWorkbook.Worksheets("AVG GOAL DATA")
With dataSheet
lastRow = .Cells(.Rows.Count, "B").End(xlUp).Row
End With
inputArray = dataSheet.Range("J4:L" & lastRow).Value
inputArray = GetLinks(inputArray)
Dim results(), r As Long, c As Long
ReDim results(1 To UBound(inputArray, 1), 1 To 8)
With xmlReq
For i = LBound(inputArray, 1) To UBound(inputArray, 1)
r = r + 1
.Open "GET", inputArray(i, 4), False
.send
If .Status <> 200 Then
Debug.Print inputArray(i, 4), vbTab, "Error " & .Status & ": " & .statusText
Else
response = .responseText
objDoc.body.innerHTML = response
Dim objTable As MSHTML.HTMLTable, objTableRow As MSHTML.HTMLTableRow
Set objTable = objDoc.getElementsByClassName("table-main leaguestats")(0)
If Not objTable Is Nothing Then
c = 1
For Each objTableRow In objTable.Rows
text = objTableRow.Cells(0).innerText
Select Case text
Case "Matches played", "Matches remaining", "Home goals", "Away goals"
results(r, c) = objTableRow.Cells(1).innerText
results(r, c + 1) = objTableRow.Cells(2).innerText
c = c + 2
End Select
Next objTableRow
End If
End If
Set objTable = Nothing
Next
End With
dataSheet.Range("M4").Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub
Public Function GetLinks(ByRef inputArray As Variant) As Variant
Dim i As Long
ReDim Preserve inputArray(1 To UBound(inputArray, 1), 1 To UBound(inputArray, 2) + 1)
For i = LBound(inputArray, 1) To UBound(inputArray, 1)
inputArray(i, 4) = IIf(inputArray(i, 1) = "CURRENT", inputArray(i, 2), inputArray(i, 3))
Next
GetLinks = inputArray
End Function
文件布局:
由于请求过多导致阻塞这里是IE版本:
'VBE > Tools > References:
'1: Microsoft HTML Object library 2: Microsoft Internet Controls
Public Sub GetSoccerStats()
Dim ie As Object, t As Date
Dim objDoc As New MSHTML.HTMLDocument, text As String
Dim lastRow As Long, dataSheet As Worksheet, inputArray(), i As Long
Const MAX_WAIT_SEC As Long = 10
Set dataSheet = ThisWorkbook.Worksheets("AVG GOAL DATA")
Set ie = CreateObject("InternetExplorer.Application")
With dataSheet
lastRow = .Cells(.Rows.Count, "B").End(xlUp).Row
End With
inputArray = dataSheet.Range("C4:E" & lastRow).Value
inputArray = GetLinks(inputArray)
Dim results(), r As Long, c As Long
ReDim results(1 To UBound(inputArray, 1), 1 To 8)
With ie
.Visible = True
For i = LBound(inputArray, 1) To UBound(inputArray, 1)
r = r + 1
.navigate2 inputArray(i, 4)
While .Busy Or .readyState < 4: DoEvents: Wend
Dim objTable As MSHTML.HTMLTable, objTableRow As MSHTML.HTMLTableRow
t = timer
Do
DoEvents
On Error Resume Next
Set objTable = .document.getElementsByClassName("table-main leaguestats")(0)
On Error GoTo 0
If Timer - t > MAX_WAIT_SEC Then Exit Do
Loop While objTable Is Nothing
If Not objTable Is Nothing Then
c = 1
For Each objTableRow In objTable.Rows
text = objTableRow.Cells(0).innerText
Select Case text
Case "Matches played", "Matches remaining", "Home goals", "Away goals"
results(r, c) = objTableRow.Cells(1).innerText
results(r, c + 1) = objTableRow.Cells(2).innerText
c = c + 2
End Select
Next objTableRow
End If
Set objTable = Nothing
Next
.Quit
End With
dataSheet.Range("F4").Resize(UBound(results, 1), UBound(results, 2)) = results
End Sub
也许这样的事情可能会奏效:
Option Explicit
Private Sub GetSoccerStats()
'Set a reference (VBE > Tools > References) to the following libraries:
' 1) Microsoft XML, v6.0
' 2) Microsoft HTML Object Library
Dim sourceSheet As Worksheet
Set sourceSheet = ThisWorkbook.Worksheets("AVG GOAL DATA")
Dim firstRowToFetchDataFor As Long
firstRowToFetchDataFor = sourceSheet.Cells(sourceSheet.Rows.Count, "C").End(xlUp).Row + 1 ' Assumes a row needs pulling if the value in column C is blank.
Dim lastRowToFetchDataFor As Long
lastRowToFetchDataFor = sourceSheet.Cells(sourceSheet.Rows.Count, "B").End(xlUp).Row
Dim xmlReq As MSXML2.XMLHTTP60
Set xmlReq = New MSXML2.XMLHTTP60
Dim htmlDoc As MSHTML.HTMLDocument
Set htmlDoc = New MSHTML.HTMLDocument
Dim rowIndex As Long
For rowIndex = firstRowToFetchDataFor To lastRowToFetchDataFor
Dim URL As String
Select Case LCase$(sourceSheet.Cells(rowIndex, "J"))
Case "current"
URL = sourceSheet.Cells(rowIndex, "K")
Case "last"
URL = sourceSheet.Cells(rowIndex, "L")
Case Else
MsgBox "Expected 'current' or 'last', instead got '" & sourceSheet.Cells(rowIndex, "J") & "' in cell '" & sourceSheet.Cells(rowIndex, "J").Address(False, False) & "'.", vbCritical
Application.Goto sourceSheet.Cells(rowIndex, "J")
Exit Sub
End Select
With xmlReq
.Open "GET", URL, False
.send
If .Status <> 200 Then
MsgBox "Request returned HTTP " & .Status & ":" & vbNewLine & vbNewLine & .statusText, vbCritical
Exit Sub
End If
htmlDoc.body.innerHTML = .responseText
End With
Dim htmlTableExtracted As MSHTML.HTMLTable
On Error Resume Next
Set htmlTableExtracted = htmlDoc.getElementsByClassName("table-main leaguestats")(0)
On Error GoTo 0
If Not (htmlTableExtracted Is Nothing) Then
Dim tableRow As MSHTML.HTMLTableRow
For Each tableRow In htmlTableExtracted.Rows
Select Case LCase$(tableRow.Cells(0).innerText)
Case "matches played"
sourceSheet.Cells(rowIndex, "G") = tableRow.Cells(1).innerText
Case "matches remaining"
sourceSheet.Cells(rowIndex, "H") = tableRow.Cells(1).innerText
Case "home goals"
sourceSheet.Cells(rowIndex, "C") = tableRow.Cells(2).innerText
Case "away goals"
sourceSheet.Cells(rowIndex, "E") = tableRow.Cells(2).innerText
End Select
Next tableRow
Set htmlTableExtracted = Nothing ' Prevent this iteration's result having effects on succeeding iterations
End If
Next rowIndex
End Sub
我可能错了,但 E
列不应该包含 "away goals" 吗?我假设 "A SCR AVG" 中的 "A" 代表 "Away"(因为 "H SCR AVG" 中的 "H" 似乎代表 "Home")。所以我将 "Away goals" 写到 E
列,尽管屏幕截图表明它们应该写到 B
列(或者我没有正确阅读)。