无法使用 XPath 获取图像源 link
Can't Get Image src link with XPath
我正在使用 Scrapy 抓取本站的产品图片 src link:
http://eshop.tesco.com.my/en-GB/Promotion/List?SortBy=Default
由于某些原因,Xpath 没有抓取产品图像 src links。我试图通过使用此 Xpath:
在 Scrapy Shell 中测试它来从站点抓取所有图像 src links
response.xpath('//img').extract()
返回结果显示,所有商品的img
标签中没有src
link
[u'<img alt="Grocery Home" class="tLogoMain" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/tLogoMain.gif" title="Grocery Home">',
u'<img src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/searchFor.png" alt="Search" class="searchFor">',
u'<img alt="Previous" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/pg-prev-disbl-btn.png">',
u'<img alt="Next" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/pg-nxt-btn.png">',
u'<img alt="Grid view" class="grdView" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/high-grd-view.png">',
u'<img alt="List view" class="lstView" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/unhigh-lst-view.png">',
u'<img alt="" id="productImg-7072093609">',
u'<img alt="" id="productImg-7070005656">',
u'<img alt="" id="productImg-7070005648">',
u'<img alt="" id="productImg-7000034983">',
u'<img alt="" id="productImg-7070483892">',
u'<img alt="" id="productImg-7000035009">',
u'<img alt="" id="productImg-7000801798">',
u'<img alt="" id="productImg-7072123710">',
u'<img alt="" id="productImg-7072123737">',
u'<img alt="" id="productImg-7072123702">',
u'<img alt="" id="productImg-7004102002">',
u'<img alt="" id="productImg-7001314416">',
u'<img alt="" id="productImg-7001829106">',
u'<img alt="" id="productImg-7001495593">',
u'<img alt="" id="productImg-7001812165">',
u'<img alt="" id="productImg-7001813226">',
u'<img alt="" id="productImg-7002760339">',
u'<img alt="" id="productImg-7001812157">',
u'<img alt="" id="productImg-7002800969">',
u'<img alt="" id="productImg-7002764067">',
u'<img alt="" id="productImg-7001866206">',
u'<img alt="" id="productImg-7070980683">',
u'<img alt="" id="productImg-7072086912">',
u'<img alt="" id="productImg-7001884344">',
u'<img alt="Previous" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/pg-prev-disbl-btn.png">',
u'<img alt="Next" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/pg-nxt-btn.png">',
u'<img src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/en-GB/i368/btn-bookslot-bskt-d.gif" class="delSlotBtn" alt="Book slot disabled">',
u'<img src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/en-GB/i368/btn-checkout-bskt-d.gif" class="chkOutBtn" alt="Checkout disabled">',
u'<img alt="" class="legendImg" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/en-GB/i368/star.png" title="">',
u'<img alt="" class="legendImg" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/en-GB/i368/star.png" title="">',
u'<img alt="Opens in a new window" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/en-GB/i368/open-window.png" title="Opens in a new window">',
u'<img src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/en-GB/i368/btn-fulltrolley-bskt-d.gif" class="fullTrolleyBtn" alt="">',
u'<img alt="Add to list" class="slAddToListDsbld" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/dsbld_sl_addtolst_icn.png">',
u'<img alt="Tesco Strapline" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/en-GB/i368/footer/strapline_footer_bottom_my.png" title="Tesco Strapline">']
我再次使用 Chrome Inspector 检查,每个产品都有 src links。为什么返回结果中没有src link?
请帮忙。
谢谢。
这是因为 javascript 呈现,您正在访问的站点的纯文本不包含该信息,但在加载过程中它被 javascript 脚本填充。
您也可以在浏览器上安装一些 Toggle Javascript extension 来检查,这样您就可以在没有 javascript 的情况下检查真正下载的内容。
可能是因为它通过 xpath - '//img' 获得了多个节点。
尝试使用以下 xpath 获取特定节点:
.//img[contains(src,'{{src的具体值}}')]
我正在使用 Scrapy 抓取本站的产品图片 src link:
http://eshop.tesco.com.my/en-GB/Promotion/List?SortBy=Default
由于某些原因,Xpath 没有抓取产品图像 src links。我试图通过使用此 Xpath:
在 Scrapy Shell 中测试它来从站点抓取所有图像 src linksresponse.xpath('//img').extract()
返回结果显示,所有商品的img
标签中没有src
link
[u'<img alt="Grocery Home" class="tLogoMain" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/tLogoMain.gif" title="Grocery Home">',
u'<img src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/searchFor.png" alt="Search" class="searchFor">',
u'<img alt="Previous" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/pg-prev-disbl-btn.png">',
u'<img alt="Next" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/pg-nxt-btn.png">',
u'<img alt="Grid view" class="grdView" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/high-grd-view.png">',
u'<img alt="List view" class="lstView" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/unhigh-lst-view.png">',
u'<img alt="" id="productImg-7072093609">',
u'<img alt="" id="productImg-7070005656">',
u'<img alt="" id="productImg-7070005648">',
u'<img alt="" id="productImg-7000034983">',
u'<img alt="" id="productImg-7070483892">',
u'<img alt="" id="productImg-7000035009">',
u'<img alt="" id="productImg-7000801798">',
u'<img alt="" id="productImg-7072123710">',
u'<img alt="" id="productImg-7072123737">',
u'<img alt="" id="productImg-7072123702">',
u'<img alt="" id="productImg-7004102002">',
u'<img alt="" id="productImg-7001314416">',
u'<img alt="" id="productImg-7001829106">',
u'<img alt="" id="productImg-7001495593">',
u'<img alt="" id="productImg-7001812165">',
u'<img alt="" id="productImg-7001813226">',
u'<img alt="" id="productImg-7002760339">',
u'<img alt="" id="productImg-7001812157">',
u'<img alt="" id="productImg-7002800969">',
u'<img alt="" id="productImg-7002764067">',
u'<img alt="" id="productImg-7001866206">',
u'<img alt="" id="productImg-7070980683">',
u'<img alt="" id="productImg-7072086912">',
u'<img alt="" id="productImg-7001884344">',
u'<img alt="Previous" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/pg-prev-disbl-btn.png">',
u'<img alt="Next" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/pg-nxt-btn.png">',
u'<img src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/en-GB/i368/btn-bookslot-bskt-d.gif" class="delSlotBtn" alt="Book slot disabled">',
u'<img src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/en-GB/i368/btn-checkout-bskt-d.gif" class="chkOutBtn" alt="Checkout disabled">',
u'<img alt="" class="legendImg" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/en-GB/i368/star.png" title="">',
u'<img alt="" class="legendImg" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/en-GB/i368/star.png" title="">',
u'<img alt="Opens in a new window" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/en-GB/i368/open-window.png" title="Opens in a new window">',
u'<img src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/en-GB/i368/btn-fulltrolley-bskt-d.gif" class="fullTrolleyBtn" alt="">',
u'<img alt="Add to list" class="slAddToListDsbld" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/i368/dsbld_sl_addtolst_icn.png">',
u'<img alt="Tesco Strapline" src="http://assets.ap-tescoassets.com/UIAssets/MY/grocery/default/en-GB/i368/footer/strapline_footer_bottom_my.png" title="Tesco Strapline">']
我再次使用 Chrome Inspector 检查,每个产品都有 src links。为什么返回结果中没有src link?
请帮忙。
谢谢。
这是因为 javascript 呈现,您正在访问的站点的纯文本不包含该信息,但在加载过程中它被 javascript 脚本填充。
您也可以在浏览器上安装一些 Toggle Javascript extension 来检查,这样您就可以在没有 javascript 的情况下检查真正下载的内容。
可能是因为它通过 xpath - '//img' 获得了多个节点。
尝试使用以下 xpath 获取特定节点: .//img[contains(src,'{{src的具体值}}')]