如何从亚马逊拉取产品的图片和标题?
How to pull the image and title of the product from Amazon?
我正在尝试根据亚马逊的唯一产品代码制作产品列表。
例如:https://www.amazon.in/gp/product/B00F2GPN36
其中 B00F2GPN36 是唯一代码。
我想将产品的图像和标题提取到产品图像和产品名称列下的 Excel 列表中。
我试过html.getElementsById("productTitle")
和html.getElementsByTagName
。
我也怀疑要描述什么样的变量来存储上述信息,因为我已经尝试声明 Object
类型和 HtmlHtmlElement
.
我试图提取 html 文档并将其用于数据搜索。
代码:
Enum READYSTATE
READYSTATE_UNINITIALIZED = 0
READYSTATE_LOADING = 1
READYSTATE_LOADED = 2
READYSTATE_INTERACTIVE = 3
READYSTATE_COMPLETE = 4
End Enum
Sub parsehtml()
Dim ie As InternetExplorer
Dim topics As Object
Dim html As HTMLDocument
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "https://www.amazon.in/gp/product/B00F2GPN36"
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to Amazon.in...."
DoEvents
Loop
Application.StatusBar = ""
Set html = ie.document
Set topics = html.getElementsById("productTitle")
Sheets(1).Cells(1, 1).Value = topics.innerText
Set ie = Nothing
End Sub
我希望单元格 A1 中的输出为:
"Milton Thermosteel Carafe Flask, 2 litres, Silver" 应该反映(不带引号),同样我也想拉图像。
但总是会出现一些错误,例如:
1. Run-time 错误 '13':
我使用 "Dim topics As HTMLHtmlElement"
时类型不匹配
2. Run-time 错误 '438':
Object 不支持此 属性 或方法
注意:我从 工具 > 参考资料 添加了参考资料,即所需的库。
vba中没有html.getElementsById("productTitle")
这样的东西。 ID 始终是唯一的,因此它应该是 html.getElementById("productTitle")
。 运行 获取它们的以下脚本:
Sub ParseHtml()
Dim IE As New InternetExplorer, elem As Object
Dim Html As HTMLDocument, imgs As Object
With IE
.Visible = False
.navigate "https://www.amazon.in/gp/product/B00F2GPN36"
While .Busy Or .readyState < 4: DoEvents: Wend
Set Html = .document
End With
Set elem = Html.getElementById("productTitle")
Set imgs = Html.getElementById("landingImage")
Sheets(1).Cells(1, 1) = elem.innerText
Sheets(1).Cells(1, 1).Offset(0, 1) = imgs.getAttribute("data-old-hires")
End Sub
更快的方法是使用 xhr 并避免使用浏览器并将结果从数组写出到 sheet
Option Explicit
Public Sub GetInfo()
Dim html As HTMLDocument, results()
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.amazon.in/gp/product/B00F2GPN36", False
.send
html.body.innerHTML = .responseText
With html
results = Array(.querySelector("#productTitle").innerText, .querySelector("#landingImage").getAttribute("data-old-hires"))
End With
End With
With ThisWorkbook.Worksheets("Sheet1")
.Cells(1, 1) = results(0)
Dim file As String
file = DownloadFile("C:\Users\User\Desktop\", results(1)) 'your path to download file
With .Pictures.Insert(file)
.Left = ThisWorkbook.Worksheets("Sheet1").Cells(1, 2).Left
.Top = ThisWorkbook.Worksheets("Sheet1").Cells(1, 2).Top
.Width = 75
.Height = 100
.Placement = 1
End With
End With
Kill file
End Sub
我正在尝试根据亚马逊的唯一产品代码制作产品列表。
例如:https://www.amazon.in/gp/product/B00F2GPN36
其中 B00F2GPN36 是唯一代码。
我想将产品的图像和标题提取到产品图像和产品名称列下的 Excel 列表中。
我试过html.getElementsById("productTitle")
和html.getElementsByTagName
。
我也怀疑要描述什么样的变量来存储上述信息,因为我已经尝试声明 Object
类型和 HtmlHtmlElement
.
我试图提取 html 文档并将其用于数据搜索。
代码:
Enum READYSTATE
READYSTATE_UNINITIALIZED = 0
READYSTATE_LOADING = 1
READYSTATE_LOADED = 2
READYSTATE_INTERACTIVE = 3
READYSTATE_COMPLETE = 4
End Enum
Sub parsehtml()
Dim ie As InternetExplorer
Dim topics As Object
Dim html As HTMLDocument
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "https://www.amazon.in/gp/product/B00F2GPN36"
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to Amazon.in...."
DoEvents
Loop
Application.StatusBar = ""
Set html = ie.document
Set topics = html.getElementsById("productTitle")
Sheets(1).Cells(1, 1).Value = topics.innerText
Set ie = Nothing
End Sub
我希望单元格 A1 中的输出为:
"Milton Thermosteel Carafe Flask, 2 litres, Silver" 应该反映(不带引号),同样我也想拉图像。
但总是会出现一些错误,例如:
1. Run-time 错误 '13':
我使用 "Dim topics As HTMLHtmlElement"
时类型不匹配
2. Run-time 错误 '438':
Object 不支持此 属性 或方法
注意:我从 工具 > 参考资料 添加了参考资料,即所需的库。
vba中没有html.getElementsById("productTitle")
这样的东西。 ID 始终是唯一的,因此它应该是 html.getElementById("productTitle")
。 运行 获取它们的以下脚本:
Sub ParseHtml()
Dim IE As New InternetExplorer, elem As Object
Dim Html As HTMLDocument, imgs As Object
With IE
.Visible = False
.navigate "https://www.amazon.in/gp/product/B00F2GPN36"
While .Busy Or .readyState < 4: DoEvents: Wend
Set Html = .document
End With
Set elem = Html.getElementById("productTitle")
Set imgs = Html.getElementById("landingImage")
Sheets(1).Cells(1, 1) = elem.innerText
Sheets(1).Cells(1, 1).Offset(0, 1) = imgs.getAttribute("data-old-hires")
End Sub
更快的方法是使用 xhr 并避免使用浏览器并将结果从数组写出到 sheet
Option Explicit
Public Sub GetInfo()
Dim html As HTMLDocument, results()
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.amazon.in/gp/product/B00F2GPN36", False
.send
html.body.innerHTML = .responseText
With html
results = Array(.querySelector("#productTitle").innerText, .querySelector("#landingImage").getAttribute("data-old-hires"))
End With
End With
With ThisWorkbook.Worksheets("Sheet1")
.Cells(1, 1) = results(0)
Dim file As String
file = DownloadFile("C:\Users\User\Desktop\", results(1)) 'your path to download file
With .Pictures.Insert(file)
.Left = ThisWorkbook.Worksheets("Sheet1").Cells(1, 2).Left
.Top = ThisWorkbook.Worksheets("Sheet1").Cells(1, 2).Top
.Width = 75
.Height = 100
.Placement = 1
End With
End With
Kill file
End Sub