Importing/scraping 一个网站进入 excel
Importing/scraping an website into excel
我正在尝试从数据库中抓取一些数据,而且我已经设置好了。我在 IE 中查找让我登录到数据库的选项卡,然后将查询 link 粘贴到 vba。但是我如何从 IE 选项卡中提取它 returns 的数据并将其放入 excel 单元格或数组中。
这是我打开查询的代码:
Sub import()
Dim row As Integer
Dim strTargetFile As String
Dim wb As Workbook
Dim test As String
Dim ie As Object
Call Fill_Array_Cultivar
For row = 3 To 4
Sheets.Add.Name = Cultivar_Array(row, 1)
strTargetFile = "https://www3.wipo.int/pluto/user/jsp/select.jsp?fl=app_date%2Cden_info%2Cden_final&hl=false&json.nl=map&wt=json&type=upov&start=0&qi=3-nNCXQ6etEVv184O9nnd5yg%3D%3D&q=cc%3AIT%20AND%20latin_name%3A(zea%20mays)%20AND%20den_info%3A" & Trim(Cultivar_Array(row, 1)) & "&facet=false"
Set ie = GetIE("https://www3.wipo.int" & "*")
If Not ie Is Nothing Then
ie.navigate (strTargetFile)
Else
MsgBox "IE not found!"
End If
Next row
End Sub
这是合适的功能:
'Find an IE window with a matching (partial) URL
'Assumes no frames.
Function GetIE(sAddress As String) As Object
Dim objShell As Object, objShellWindows As Object, o As Object
Dim retVal As Object, sURL As String
Set retVal = Nothing
Set objShell = CreateObject("Shell.Application")
Set objShellWindows = objShell.Windows
'see if IE is already open
For Each o In objShellWindows
sURL = ""
On Error Resume Next
sURL = o.document.Location
On Error GoTo 0
If sURL <> "" Then
If sURL Like sAddress & "*" Then
Set retVal = o
Exit For
End If
End If
Next o
Set GetIE = retVal
End Function
网站returns对我来说就是白页,一行文字。这是一个例子:
{"response":{"start":0,"docs":[{"den_final":"Abacus","app_date":"1998-01-13T22:59:59Z"}],"numFound":1},"qi":"3-nNCXQ6etEVv184O9nnd5yg==","sv":"bswa2.wipo.int","lastUpdated":1436333633993}
PS。我也试过使用 importxml 功能,它会导入网站,但只有一个错误页面,因为它无法识别我是否已登录。
我找到了解决方案,它相当简单但很难找到。
我可以只获取 ie.Document.body.innertext,这是我需要的所有文本。
请参阅我在下面更新的代码:
Sub import()
Dim row As Integer
Dim strTargetFile As String
Dim wb As Workbook
Dim test As String
Dim ie As Object
Dim pageText As String
Call Fill_Array_Cultivar
For row = 3 To 4
Sheets.Add.Name = Cultivar_Array(row, 1)
strTargetFile = "https://www3.wipo.int/pluto/user/jsp/select.jsp?fl=app_date%2Cden_info%2Cden_final&hl=false&json.nl=map&wt=json&type=upov&start=0&qi=3-nNCXQ6etEVv184O9nnd5yg%3D%3D&q=cc%3AIT%20AND%20latin_name%3A(zea%20mays)%20AND%20den_info%3A" & Trim(Cultivar_Array(row, 1)) & "&facet=false"
Set ie = GetIE("https://www3.wipo.int" & "*")
If Not ie Is Nothing Then
ie.navigate (strTargetFile)
Do Until ie.ReadyState = 4: DoEvents: Loop
pageText = ie.Document.body.innertext
ActiveSheet.Cells(1, 1) = pageText
pageText = Empty
Else
MsgBox "IE not found!"
End If
Next row
End Sub
我正在尝试从数据库中抓取一些数据,而且我已经设置好了。我在 IE 中查找让我登录到数据库的选项卡,然后将查询 link 粘贴到 vba。但是我如何从 IE 选项卡中提取它 returns 的数据并将其放入 excel 单元格或数组中。
这是我打开查询的代码:
Sub import()
Dim row As Integer
Dim strTargetFile As String
Dim wb As Workbook
Dim test As String
Dim ie As Object
Call Fill_Array_Cultivar
For row = 3 To 4
Sheets.Add.Name = Cultivar_Array(row, 1)
strTargetFile = "https://www3.wipo.int/pluto/user/jsp/select.jsp?fl=app_date%2Cden_info%2Cden_final&hl=false&json.nl=map&wt=json&type=upov&start=0&qi=3-nNCXQ6etEVv184O9nnd5yg%3D%3D&q=cc%3AIT%20AND%20latin_name%3A(zea%20mays)%20AND%20den_info%3A" & Trim(Cultivar_Array(row, 1)) & "&facet=false"
Set ie = GetIE("https://www3.wipo.int" & "*")
If Not ie Is Nothing Then
ie.navigate (strTargetFile)
Else
MsgBox "IE not found!"
End If
Next row
End Sub
这是合适的功能:
'Find an IE window with a matching (partial) URL
'Assumes no frames.
Function GetIE(sAddress As String) As Object
Dim objShell As Object, objShellWindows As Object, o As Object
Dim retVal As Object, sURL As String
Set retVal = Nothing
Set objShell = CreateObject("Shell.Application")
Set objShellWindows = objShell.Windows
'see if IE is already open
For Each o In objShellWindows
sURL = ""
On Error Resume Next
sURL = o.document.Location
On Error GoTo 0
If sURL <> "" Then
If sURL Like sAddress & "*" Then
Set retVal = o
Exit For
End If
End If
Next o
Set GetIE = retVal
End Function
网站returns对我来说就是白页,一行文字。这是一个例子:
{"response":{"start":0,"docs":[{"den_final":"Abacus","app_date":"1998-01-13T22:59:59Z"}],"numFound":1},"qi":"3-nNCXQ6etEVv184O9nnd5yg==","sv":"bswa2.wipo.int","lastUpdated":1436333633993}
PS。我也试过使用 importxml 功能,它会导入网站,但只有一个错误页面,因为它无法识别我是否已登录。
我找到了解决方案,它相当简单但很难找到。 我可以只获取 ie.Document.body.innertext,这是我需要的所有文本。 请参阅我在下面更新的代码:
Sub import()
Dim row As Integer
Dim strTargetFile As String
Dim wb As Workbook
Dim test As String
Dim ie As Object
Dim pageText As String
Call Fill_Array_Cultivar
For row = 3 To 4
Sheets.Add.Name = Cultivar_Array(row, 1)
strTargetFile = "https://www3.wipo.int/pluto/user/jsp/select.jsp?fl=app_date%2Cden_info%2Cden_final&hl=false&json.nl=map&wt=json&type=upov&start=0&qi=3-nNCXQ6etEVv184O9nnd5yg%3D%3D&q=cc%3AIT%20AND%20latin_name%3A(zea%20mays)%20AND%20den_info%3A" & Trim(Cultivar_Array(row, 1)) & "&facet=false"
Set ie = GetIE("https://www3.wipo.int" & "*")
If Not ie Is Nothing Then
ie.navigate (strTargetFile)
Do Until ie.ReadyState = 4: DoEvents: Loop
pageText = ie.Document.body.innertext
ActiveSheet.Cells(1, 1) = pageText
pageText = Empty
Else
MsgBox "IE not found!"
End If
Next row
End Sub