Etsy 产品抓取器提取一行数据
Etsy product scraper pulling off one row of data
我正在尝试提取 Etsy.com 的一些产品数据 我不确定是因为我有错误的 Parent class 我无法提取数据还是其他原因问题。我已经尝试了几个 classes,因为 parent class 当前的 class 允许我拉下一行。
Link Etsy.com
我一直在等待页面加载并向下滚动页面以确保它正确加载而不是作为延迟加载程序。但是我仍然只能提取一行数据。
我下面的代码通常对我有效
Set Html = objIE.document
Set elements = Html.getElementsByClassName("bg-white display-block pb-xs-2 mt-xs-0") ' parent CLASS
'FOR LOOP
For Each element In elements
''' Element 1
If element.getElementsByClassName("js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none ")(0).getElementsByTagName("a")(0) Is Nothing Then ' Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = "-" 'If Nothing then Hyphen in CELL
Else
HtmlText = element.getElementsByClassName("js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none ")(0).getElementsByTagName("a")(0).href 'Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = HtmlText 'return value in column
End If
''' Element 2
If element.getElementsByClassName("text-gray text-truncate mb-xs-0 text-body")(0) Is Nothing Then ' Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = "-" 'If Nothing then Hyphen in CELL
Else
HtmlText = element.getElementsByClassName("text-gray text-truncate mb-xs-0 text-body")(0).innerText ' Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = HtmlText 'return value in column
End If
''' Element 3
第二个PARENTCLASS
我以为我已经解决了问题,但没有 post 我原来的上述问题。
通过下面的 parent class,我能够完成整页 50 多个项目和 A 列结果。 从那以后我什么都没改变, 但是我不能再re-produce 同样的结果。我得到的只是一行,我不明白为什么。一段时间以来,我一直在尝试解决此问题,但无法确定问题所在。下面的 class 工作了一次并提取了 50 多个结果,现在它只做了 1 行,我已经清除了所有浏览器缓存,并重新启动了 PC,
第二个PARENTCLASS
Set Html = objIE.document
Set elements = Html.getElementsByClassName("wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container") ' parent CLASS
'FOR LOOP
For Each element In elements
我已经尝试了以下 Classes,只有两个作为注释状态取得了一些结果
'wt-mt-xs-2 wt-text-black
'col-group pl-xs-0 search-listings-group pr-xs-1
'col-xs-12 pl-xs-1 pl-md-3
'responsive-listing-grid wt-grid wt-grid--block wt-justify-content-flex-start wt-list-unstyled pl-xs-0
'bg-white display-block pb-xs-2 mt-xs-0
'''''wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container 'Can only do 1 row
'''''wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container 'I was able to pull of 50+ items now not working
'wt-list-unstyled wt-grid__item-xs-6 wt-grid__item-md-4 wt-grid__item-lg-3 wt-grid__item-xl-3 wt-order-xs-0 wt-order-md-0 wt-order-lg-0 wt-order-xl-0 wt-show-xs wt-show-md wt-show-lg wt-show-xl tab-reorder
'js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none
每个项目都有一个 li Class,请参阅下图了解更多信息
问题-有人可以告诉我做错了什么吗? (我用第二个 parent class 成功地拉出 50 多个结果,但是现在只拉出 1 行,我无法解决)
<li class="wt-list-unstyled wt-grid__item-xs-6 wt-grid__item-md-4 wt-grid__item-lg-3 wt-grid__item-xl-3 wt-order-xs-0 wt-order-md-0 wt-order-lg-0 wt-order-xl-0 wt-show-xs wt-show-md wt-show-lg wt-show-xl tab-reorder">
<div class="js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none " data-palette-listing-id="973170689" data-shop-id="" data-listing-id="973170689" data-behat-listing-card="" data-listing-card-v2="">
<a class="6dd4c4354676ccda display-inline-block listing-link logged" data-listing-id="973170689" data-palette-listing-image="" href="https://www.etsy.com/uk/listing/973170689/deconstructed-iphone-5-artwork?ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=phones&ref=sc_gallery-1-1&plkey=247d3e6c1599979de70c884db995d78e95827f21%3A973170689&frs=1"
data-display-loc="w.0" data-page-num="1" data-position-num="1" data-logging-key="247d3e6c1599979de70c884db995d78e95827f21:973170689" target="etsy.973170689" title="Deconstructed iPhone 5 artwork">
<div class="v2-listing-card__img position-relative">
<div data-listing-card-image="">
<div class="placeholder placeholder-landscape ">
<div class="placeholder-content ">
<div class="placeholder vertically-centered-placeholder placeholder-landscape">
<div class="height-placeholder">
<img data-listing-card-listing-image="" src="https://i.etsystatic.com/27880825/c/2250/1788/0/538/il/116587/2961533797/il_340x270.2961533797_r4pc.jpg" class="width-full wt-height-full display-block position-absolute " alt="">
</div>
</div>
</div>
</div>
</div>
</div>
<div class="v2-listing-card__info
">
<div>
<h3 class="text-gray text-truncate mb-xs-0 text-body ">
Deconstructed iPhone 5 artwork
</h3>
<p>
</p>
<div class="v2-listing-card__shop">
<p class="text-gray-lighter text-body-smaller display-inline-block" aria-hidden="true"><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">A</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span>
<span
class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">d</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014"> </span><span class="p06299890 c968b3da8">E</span>
<span
class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">b</span>
<span
class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">y</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014"> </span>
<span
class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span>DissectProjects</p>
<p class="screen-reader-only">Ad from shop DissectProjects</p>
<span class="v2-listing-card__rating icon-t-2 display-block">
</span>
</div>
<span class="n-listing-card__price text-gray mt-xs-0 strong display-block
text-body-larger
">
<span class="currency-symbol">£</span><span class="currency-value">120.00</span>
<span class="text-body-smaller no-wrap">
span class="wt-badge wt-badge--small wt-badge--sale-01">
FREE UK delivery</span>
</span>
</span>
<p></p>
</div>
</div>
</a>
<div data-favorite-button-wrapper="" class="v2-listing-card__actions z-index-1 position-absolute">
<button class="inline-overlay-trigger favorite-item-action position-absolute favorite-listing-button p-xs-1 has-hover-state z-index-1 btn-transparent position-right in-search v2-listing-card__favorite" data-ui="favorite-listing-button" data-listing-id="973170689"
data-accessible-btn-fave="" data-favorite-label="Add to Favourites" data-favorited-label="Remove from Favourites">
<div data-source="search" data-btn-fave="" data-neu-fave="">
<span class="favorite-listing-button-icon-container icon-circle-container bg-white icon-group p-xs-1
" data-favorite-icon-container="">
<span class="etsy-icon icon-smaller text-gray wt-display-block
" data-not-favorited-icon=""><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M12,21C10.349,21,2,14.688,2,9,2,5.579,4.364,3,7.5,3A6.912,6.912,0,0,1,12,5.051,6.953,6.953,0,0,1,16.5,3C19.636,3,22,5.579,22,9,22,14.688,13.651,21,12,21ZM7.5,5C5.472,5,4,6.683,4,9c0,4.108,6.432,9.325,8,10,1.564-.657,8-5.832,8-10,0-2.317-1.472-4-3.5-4-1.979,0-3.7,2.105-3.721,2.127L11.991,8.1,11.216,7.12C11.186,7.083,9.5,5,7.5,5Z"></path></svg></span>
<span class="etsy-icon icon-smaller text-red wt-display-none
" data-favorited-icon=""><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M16.5,3A6.953,6.953,0,0,0,12,5.051,6.912,6.912,0,0,0,7.5,3C4.364,3,2,5.579,2,9c0,5.688,8.349,12,10,12S22,14.688,22,9C22,5.579,19.636,3,16.5,3Z"></path></svg></span>
</span>
</div>
<!--icon font and display:none; elements -->
<span aria-hidden="true" class="icon"></span>
<span class="screen-reader-only default" data-a11y-label="">
Add to Favourites
</span>
</button>
</div>
</div>
</li>
从 SIM 代码更新
我用它来向下滚动浏览器。
objIE.document.parentWindow.Scroll 0&, 9999 ' Scrolls Down the Browser
''######################今天更新################### ####
我猜 parent class 是 v2-listing-card__info
但是如果我没记错的话 PRODUCT URL 不属于这个,所以我怎么得到那个
到目前为止的结果,我还没有更正元素的所有其他 class
''####################### 今天更新 19/3/2021 ############# #########
非常感谢 SIM
给予的支持,也感谢 Qharr
的意见。最后我解决了这个问题,谢谢大家
结果
一如既往地提前致谢
试试这个:
Sub GetTitles()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim posts As Object, post As Object, startTime As Double
Dim timeout As Integer, prevlen&, curlen&
timeout = 5
With IE
.Visible = True
.navigate "https://www.etsy.com/uk/search?q=phones"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
prevlen = HTML.getElementsByClassName("v2-listing-card").Length
startTime = Timer
Do
HTML.parentWindow.scrollBy 0, 99999
Set posts = HTML.getElementsByClassName("v2-listing-card")
curlen = posts.Length
If curlen > prevlen Then
startTime = Timer
prevlen = curlen
End If
Loop While Round(Timer - startTime, 2) <= timeout
For Each post In posts
Debug.Print post.getElementsByTagName("h3")(0).innerText
Debug.Print post.getElementsByClassName("listing-link")(0).getAttribute("href")
Next post
IE.Quit
End Sub
顺便说一句,如果你使用
v2-listing-card__info
作为容器,请确保使用以下行
post.ParentNode.ParentNode.getElementsByClassName("listing-link")(0).href
获取产品 links。
我正在尝试提取 Etsy.com 的一些产品数据 我不确定是因为我有错误的 Parent class 我无法提取数据还是其他原因问题。我已经尝试了几个 classes,因为 parent class 当前的 class 允许我拉下一行。
Link Etsy.com
我一直在等待页面加载并向下滚动页面以确保它正确加载而不是作为延迟加载程序。但是我仍然只能提取一行数据。
我下面的代码通常对我有效
Set Html = objIE.document
Set elements = Html.getElementsByClassName("bg-white display-block pb-xs-2 mt-xs-0") ' parent CLASS
'FOR LOOP
For Each element In elements
''' Element 1
If element.getElementsByClassName("js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none ")(0).getElementsByTagName("a")(0) Is Nothing Then ' Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = "-" 'If Nothing then Hyphen in CELL
Else
HtmlText = element.getElementsByClassName("js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none ")(0).getElementsByTagName("a")(0).href 'Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = HtmlText 'return value in column
End If
''' Element 2
If element.getElementsByClassName("text-gray text-truncate mb-xs-0 text-body")(0) Is Nothing Then ' Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = "-" 'If Nothing then Hyphen in CELL
Else
HtmlText = element.getElementsByClassName("text-gray text-truncate mb-xs-0 text-body")(0).innerText ' Get CLASS
wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = HtmlText 'return value in column
End If
''' Element 3
第二个PARENTCLASS
我以为我已经解决了问题,但没有 post 我原来的上述问题。 通过下面的 parent class,我能够完成整页 50 多个项目和 A 列结果。 从那以后我什么都没改变, 但是我不能再re-produce 同样的结果。我得到的只是一行,我不明白为什么。一段时间以来,我一直在尝试解决此问题,但无法确定问题所在。下面的 class 工作了一次并提取了 50 多个结果,现在它只做了 1 行,我已经清除了所有浏览器缓存,并重新启动了 PC,
第二个PARENTCLASS
Set Html = objIE.document
Set elements = Html.getElementsByClassName("wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container") ' parent CLASS
'FOR LOOP
For Each element In elements
我已经尝试了以下 Classes,只有两个作为注释状态取得了一些结果
'wt-mt-xs-2 wt-text-black
'col-group pl-xs-0 search-listings-group pr-xs-1
'col-xs-12 pl-xs-1 pl-md-3
'responsive-listing-grid wt-grid wt-grid--block wt-justify-content-flex-start wt-list-unstyled pl-xs-0
'bg-white display-block pb-xs-2 mt-xs-0
'''''wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container 'Can only do 1 row
'''''wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container 'I was able to pull of 50+ items now not working
'wt-list-unstyled wt-grid__item-xs-6 wt-grid__item-md-4 wt-grid__item-lg-3 wt-grid__item-xl-3 wt-order-xs-0 wt-order-md-0 wt-order-lg-0 wt-order-xl-0 wt-show-xs wt-show-md wt-show-lg wt-show-xl tab-reorder
'js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none
每个项目都有一个 li Class,请参阅下图了解更多信息
问题-有人可以告诉我做错了什么吗? (我用第二个 parent class 成功地拉出 50 多个结果,但是现在只拉出 1 行,我无法解决)
<li class="wt-list-unstyled wt-grid__item-xs-6 wt-grid__item-md-4 wt-grid__item-lg-3 wt-grid__item-xl-3 wt-order-xs-0 wt-order-md-0 wt-order-lg-0 wt-order-xl-0 wt-show-xs wt-show-md wt-show-lg wt-show-xl tab-reorder">
<div class="js-merch-stash-check-listing v2-listing-card position-relative flex-xs-none " data-palette-listing-id="973170689" data-shop-id="" data-listing-id="973170689" data-behat-listing-card="" data-listing-card-v2="">
<a class="6dd4c4354676ccda display-inline-block listing-link logged" data-listing-id="973170689" data-palette-listing-image="" href="https://www.etsy.com/uk/listing/973170689/deconstructed-iphone-5-artwork?ga_order=most_relevant&ga_search_type=all&ga_view_type=gallery&ga_search_query=phones&ref=sc_gallery-1-1&plkey=247d3e6c1599979de70c884db995d78e95827f21%3A973170689&frs=1"
data-display-loc="w.0" data-page-num="1" data-position-num="1" data-logging-key="247d3e6c1599979de70c884db995d78e95827f21:973170689" target="etsy.973170689" title="Deconstructed iPhone 5 artwork">
<div class="v2-listing-card__img position-relative">
<div data-listing-card-image="">
<div class="placeholder placeholder-landscape ">
<div class="placeholder-content ">
<div class="placeholder vertically-centered-placeholder placeholder-landscape">
<div class="height-placeholder">
<img data-listing-card-listing-image="" src="https://i.etsystatic.com/27880825/c/2250/1788/0/538/il/116587/2961533797/il_340x270.2961533797_r4pc.jpg" class="width-full wt-height-full display-block position-absolute " alt="">
</div>
</div>
</div>
</div>
</div>
</div>
<div class="v2-listing-card__info
">
<div>
<h3 class="text-gray text-truncate mb-xs-0 text-body ">
Deconstructed iPhone 5 artwork
</h3>
<p>
</p>
<div class="v2-listing-card__shop">
<p class="text-gray-lighter text-body-smaller display-inline-block" aria-hidden="true"><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">A</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span>
<span
class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">d</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014"> </span><span class="p06299890 c968b3da8">E</span>
<span
class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">b</span>
<span
class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">y</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014"> </span>
<span
class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span>DissectProjects</p>
<p class="screen-reader-only">Ad from shop DissectProjects</p>
<span class="v2-listing-card__rating icon-t-2 display-block">
</span>
</div>
<span class="n-listing-card__price text-gray mt-xs-0 strong display-block
text-body-larger
">
<span class="currency-symbol">£</span><span class="currency-value">120.00</span>
<span class="text-body-smaller no-wrap">
span class="wt-badge wt-badge--small wt-badge--sale-01">
FREE UK delivery</span>
</span>
</span>
<p></p>
</div>
</div>
</a>
<div data-favorite-button-wrapper="" class="v2-listing-card__actions z-index-1 position-absolute">
<button class="inline-overlay-trigger favorite-item-action position-absolute favorite-listing-button p-xs-1 has-hover-state z-index-1 btn-transparent position-right in-search v2-listing-card__favorite" data-ui="favorite-listing-button" data-listing-id="973170689"
data-accessible-btn-fave="" data-favorite-label="Add to Favourites" data-favorited-label="Remove from Favourites">
<div data-source="search" data-btn-fave="" data-neu-fave="">
<span class="favorite-listing-button-icon-container icon-circle-container bg-white icon-group p-xs-1
" data-favorite-icon-container="">
<span class="etsy-icon icon-smaller text-gray wt-display-block
" data-not-favorited-icon=""><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M12,21C10.349,21,2,14.688,2,9,2,5.579,4.364,3,7.5,3A6.912,6.912,0,0,1,12,5.051,6.953,6.953,0,0,1,16.5,3C19.636,3,22,5.579,22,9,22,14.688,13.651,21,12,21ZM7.5,5C5.472,5,4,6.683,4,9c0,4.108,6.432,9.325,8,10,1.564-.657,8-5.832,8-10,0-2.317-1.472-4-3.5-4-1.979,0-3.7,2.105-3.721,2.127L11.991,8.1,11.216,7.12C11.186,7.083,9.5,5,7.5,5Z"></path></svg></span>
<span class="etsy-icon icon-smaller text-red wt-display-none
" data-favorited-icon=""><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M16.5,3A6.953,6.953,0,0,0,12,5.051,6.912,6.912,0,0,0,7.5,3C4.364,3,2,5.579,2,9c0,5.688,8.349,12,10,12S22,14.688,22,9C22,5.579,19.636,3,16.5,3Z"></path></svg></span>
</span>
</div>
<!--icon font and display:none; elements -->
<span aria-hidden="true" class="icon"></span>
<span class="screen-reader-only default" data-a11y-label="">
Add to Favourites
</span>
</button>
</div>
</div>
</li>
从 SIM 代码更新
我用它来向下滚动浏览器。
objIE.document.parentWindow.Scroll 0&, 9999 ' Scrolls Down the Browser
''######################今天更新################### ####
我猜 parent class 是 v2-listing-card__info
但是如果我没记错的话 PRODUCT URL 不属于这个,所以我怎么得到那个
到目前为止的结果,我还没有更正元素的所有其他 class
''####################### 今天更新 19/3/2021 ############# #########
非常感谢 SIM
给予的支持,也感谢 Qharr
的意见。最后我解决了这个问题,谢谢大家
结果
一如既往地提前致谢
试试这个:
Sub GetTitles()
Dim IE As New InternetExplorer, HTML As HTMLDocument
Dim posts As Object, post As Object, startTime As Double
Dim timeout As Integer, prevlen&, curlen&
timeout = 5
With IE
.Visible = True
.navigate "https://www.etsy.com/uk/search?q=phones"
While .Busy = True Or .readyState < 4: DoEvents: Wend
Set HTML = .document
End With
prevlen = HTML.getElementsByClassName("v2-listing-card").Length
startTime = Timer
Do
HTML.parentWindow.scrollBy 0, 99999
Set posts = HTML.getElementsByClassName("v2-listing-card")
curlen = posts.Length
If curlen > prevlen Then
startTime = Timer
prevlen = curlen
End If
Loop While Round(Timer - startTime, 2) <= timeout
For Each post In posts
Debug.Print post.getElementsByTagName("h3")(0).innerText
Debug.Print post.getElementsByClassName("listing-link")(0).getAttribute("href")
Next post
IE.Quit
End Sub
顺便说一句,如果你使用
v2-listing-card__info
作为容器,请确保使用以下行
post.ParentNode.ParentNode.getElementsByClassName("listing-link")(0).href
获取产品 links。