Etsy 产品抓取器提取一行数据

Etsy product scraper pulling off one row of data

我正在尝试提取 Etsy.com 的一些产品数据 我不确定是因为我有错误的 Parent class 我无法提取数据还是其他原因问题。我已经尝试了几个 classes,因为 parent class 当前的 class 允许我拉下一行。

Link Etsy.com

我一直在等待页面加载并向下滚动页面以确保它正确加载而不是作为延迟加载程序。但是我仍然只能提取一行数据。

我下面的代码通常对我有效

        Set Html = objIE.document
        Set elements = Html.getElementsByClassName("bg-white display-block pb-xs-2 mt-xs-0") ' parent CLASS
        'FOR LOOP
        For Each element In elements
     
''' Element 1

        If element.getElementsByClassName("js-merch-stash-check-listing  v2-listing-card position-relative flex-xs-none ")(0).getElementsByTagName("a")(0) Is Nothing Then ' Get CLASS 
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = "-" 'If Nothing then Hyphen in CELL
        Else
            HtmlText = element.getElementsByClassName("js-merch-stash-check-listing  v2-listing-card position-relative flex-xs-none ")(0).getElementsByTagName("a")(0).href 'Get CLASS 
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "A").End(xlUp).Row + 1, "A").Value = HtmlText 'return value in column
        End If
''' Element 2

        If element.getElementsByClassName("text-gray text-truncate mb-xs-0 text-body")(0) Is Nothing Then ' Get CLASS 
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = "-" 'If Nothing then Hyphen in CELL
        Else
            HtmlText = element.getElementsByClassName("text-gray text-truncate mb-xs-0 text-body")(0).innerText ' Get CLASS 
            wsSheet.Cells(sht.Cells(sht.Rows.Count, "B").End(xlUp).Row + 1, "B").Value = HtmlText 'return value in column
        End If
''' Element 3

第二个PARENTCLASS

我以为我已经解决了问题,但没有 post 我原来的上述问题。 通过下面的 parent class,我能够完成整页 50 多个项目和 A 列结果。 从那以后我什么都没改变, 但是我不能再re-produce 同样的结果。我得到的只是一行,我不明白为什么。一段时间以来,我一直在尝试解决此问题,但无法确定问题所在。下面的 class 工作了一次并提取了 50 多个结果,现在它只做了 1 行,我已经清除了所有浏览器缓存,并重新启动了 PC,

第二个PARENTCLASS

Set Html = objIE.document
        Set elements = Html.getElementsByClassName("wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container") ' parent CLASS
        'FOR LOOP
        For Each element In elements

我已经尝试了以下 Classes,只有两个作为注释状态取得了一些结果

'wt-mt-xs-2 wt-text-black
'col-group pl-xs-0 search-listings-group pr-xs-1
'col-xs-12 pl-xs-1 pl-md-3
'responsive-listing-grid wt-grid wt-grid--block wt-justify-content-flex-start wt-list-unstyled pl-xs-0
'bg-white display-block pb-xs-2 mt-xs-0
'''''wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container 'Can only do 1 row
'''''wt-grid wt-grid--block wt-pl-xs-0 tab-reorder-container 'I was able to pull of 50+ items now not working
'wt-list-unstyled wt-grid__item-xs-6 wt-grid__item-md-4 wt-grid__item-lg-3 wt-grid__item-xl-3 wt-order-xs-0 wt-order-md-0 wt-order-lg-0 wt-order-xl-0 wt-show-xs wt-show-md wt-show-lg wt-show-xl tab-reorder
'js-merch-stash-check-listing  v2-listing-card position-relative flex-xs-none

每个项目都有一个 li Class,请参阅下图了解更多信息

问题-有人可以告诉我做错了什么吗? (我用第二个 parent class 成功地拉出 50 多个结果,但是现在只拉出 1 行,我无法解决)

<li class="wt-list-unstyled wt-grid__item-xs-6 wt-grid__item-md-4 wt-grid__item-lg-3 wt-grid__item-xl-3 wt-order-xs-0 wt-order-md-0 wt-order-lg-0 wt-order-xl-0 wt-show-xs wt-show-md wt-show-lg wt-show-xl tab-reorder">
  <div class="js-merch-stash-check-listing  v2-listing-card position-relative flex-xs-none " data-palette-listing-id="973170689" data-shop-id="" data-listing-id="973170689" data-behat-listing-card="" data-listing-card-v2="">
    <a class="6dd4c4354676ccda display-inline-block listing-link  logged" data-listing-id="973170689" data-palette-listing-image="" href="https://www.etsy.com/uk/listing/973170689/deconstructed-iphone-5-artwork?ga_order=most_relevant&amp;ga_search_type=all&amp;ga_view_type=gallery&amp;ga_search_query=phones&amp;ref=sc_gallery-1-1&amp;plkey=247d3e6c1599979de70c884db995d78e95827f21%3A973170689&amp;frs=1"
      data-display-loc="w.0" data-page-num="1" data-position-num="1" data-logging-key="247d3e6c1599979de70c884db995d78e95827f21:973170689" target="etsy.973170689" title="Deconstructed iPhone 5 artwork">
      <div class="v2-listing-card__img position-relative">
        <div data-listing-card-image="">
          <div class="placeholder placeholder-landscape  ">
            <div class="placeholder-content  ">
              <div class="placeholder vertically-centered-placeholder placeholder-landscape">
                <div class="height-placeholder">
                  <img data-listing-card-listing-image="" src="https://i.etsystatic.com/27880825/c/2250/1788/0/538/il/116587/2961533797/il_340x270.2961533797_r4pc.jpg" class="width-full wt-height-full display-block position-absolute " alt="">
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
      <div class="v2-listing-card__info
">
        <div>
          <h3 class="text-gray text-truncate mb-xs-0 text-body ">
            Deconstructed iPhone 5 artwork
          </h3>
          <p>
          </p>
          <div class="v2-listing-card__shop">
            <p class="text-gray-lighter text-body-smaller display-inline-block" aria-hidden="true"><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">A</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span>
              <span
                class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">d</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014"> </span><span class="p06299890 c968b3da8">E</span>
                <span
                  class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">b</span>
                  <span
                    class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014">y</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="c968b3da8 s0cd3f014"> </span>
                    <span
                      class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span><span class="p06299890 c968b3da8">E</span>DissectProjects</p>
            <p class="screen-reader-only">Ad from shop DissectProjects</p>
            <span class="v2-listing-card__rating icon-t-2 display-block">
        </span>
          </div>
          <span class="n-listing-card__price text-gray mt-xs-0 strong display-block
         text-body-larger
        ">
            <span class="currency-symbol">£</span><span class="currency-value">120.00</span>
          <span class="text-body-smaller no-wrap">
             span class="wt-badge wt-badge--small wt-badge--sale-01">
             FREE UK delivery</span>
          </span>
          </span>
          <p></p>
        </div>
      </div>
    </a>
    <div data-favorite-button-wrapper="" class="v2-listing-card__actions z-index-1 position-absolute">
      <button class="inline-overlay-trigger favorite-item-action position-absolute favorite-listing-button p-xs-1 has-hover-state z-index-1 btn-transparent position-right in-search v2-listing-card__favorite" data-ui="favorite-listing-button" data-listing-id="973170689"
        data-accessible-btn-fave="" data-favorite-label="Add to Favourites" data-favorited-label="Remove from Favourites">
            <div data-source="search" data-btn-fave="" data-neu-fave="">
                <span class="favorite-listing-button-icon-container icon-circle-container bg-white icon-group p-xs-1       
                 " data-favorite-icon-container="">
                    <span class="etsy-icon icon-smaller text-gray wt-display-block   
                        " data-not-favorited-icon=""><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M12,21C10.349,21,2,14.688,2,9,2,5.579,4.364,3,7.5,3A6.912,6.912,0,0,1,12,5.051,6.953,6.953,0,0,1,16.5,3C19.636,3,22,5.579,22,9,22,14.688,13.651,21,12,21ZM7.5,5C5.472,5,4,6.683,4,9c0,4.108,6.432,9.325,8,10,1.564-.657,8-5.832,8-10,0-2.317-1.472-4-3.5-4-1.979,0-3.7,2.105-3.721,2.127L11.991,8.1,11.216,7.12C11.186,7.083,9.5,5,7.5,5Z"></path></svg></span>
                    <span class="etsy-icon icon-smaller text-red wt-display-none     
                        " data-favorited-icon=""><svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" aria-hidden="true" focusable="false"><path d="M16.5,3A6.953,6.953,0,0,0,12,5.051,6.912,6.912,0,0,0,7.5,3C4.364,3,2,5.579,2,9c0,5.688,8.349,12,10,12S22,14.688,22,9C22,5.579,19.636,3,16.5,3Z"></path></svg></span>
                </span>
            </div>
            <!--icon font and display:none; elements -->
            <span aria-hidden="true" class="icon"></span>
            <span class="screen-reader-only default" data-a11y-label="">
                Add to Favourites
            </span>
    </button>
    </div>
  </div>
</li>

从 SIM 代码更新

我用它来向下滚动浏览器。

objIE.document.parentWindow.Scroll 0&, 9999 ' Scrolls Down the Browser

''######################今天更新################### ####

我猜 parent class 是 v2-listing-card__info 但是如果我没记错的话 PRODUCT URL 不属于这个,所以我怎么得到那个

到目前为止的结果,我还没有更正元素的所有其他 class

''####################### 今天更新 19/3/2021 ############# #########

非常感谢 SIM 给予的支持,也感谢 Qharr 的意见。最后我解决了这个问题,谢谢大家

结果

一如既往地提前致谢

试试这个:

Sub GetTitles()
    Dim IE As New InternetExplorer, HTML As HTMLDocument
    Dim posts As Object, post As Object, startTime As Double
    Dim timeout As Integer, prevlen&, curlen&

    timeout = 5

    With IE
        .Visible = True
        .navigate "https://www.etsy.com/uk/search?q=phones"
        While .Busy = True Or .readyState < 4: DoEvents: Wend
        Set HTML = .document
    End With
    
    prevlen = HTML.getElementsByClassName("v2-listing-card").Length

    startTime = Timer

    Do
        HTML.parentWindow.scrollBy 0, 99999
        Set posts = HTML.getElementsByClassName("v2-listing-card")
        curlen = posts.Length
        
        If curlen > prevlen Then
            startTime = Timer
            prevlen = curlen
        End If
    Loop While Round(Timer - startTime, 2) <= timeout

    For Each post In posts
        Debug.Print post.getElementsByTagName("h3")(0).innerText
        Debug.Print post.getElementsByClassName("listing-link")(0).getAttribute("href")
    Next post
    IE.Quit
End Sub

顺便说一句,如果你使用

v2-listing-card__info

作为容器,请确保使用以下行

post.ParentNode.ParentNode.getElementsByClassName("listing-link")(0).href

获取产品 links。