用 beautifulsoup 抓取图片 URL

Scrapping Image URLs with beautifusoup

今天想学点东西,做点废话。

我正在尝试将产品名称和相应的图片 URL 列到电子表格中。

我设法存储了名称,但图像似乎不起作用。希望你能帮上忙!

这是我用来提取文本的代码:

results[0].find('p', {'class': 'product-card__name'}).get_text()

这是我认为可以提取图像的方法:

results[0].find('img', {'class':'product-card__image'}).get_src()

这显然不是 working.Returning “'NoneType' 对象不可调用”

有什么指点吗?

作为参考,下面是我试图抓取的来源。

<li class="product-grid__item"><a href="/p/63818/bumbu-the-original-rum-glass-pack" class="product-card" title=" Bumbu The Original Rum Glass Pack" onclick="_gaq.push(['_trackEvent', 'Products-GridView', 'click', '63818 : Bumbu The Original Rum / Glass Pack'])"><div class="product-card__image-container"><img src="https://img.thewhiskyexchange.com/480/rum_bum4.jpg" alt="Bumbu The Original Rum Glass Pack" class="product-card__image" loading="lazy" width="3" height="4"></div><div class="product-card__content"><p class="product-card__name"> Bumbu The Original Rum<span class="product-card__name-secondary">Glass Pack</span></p><p class="product-card__meta"> 70cl / 40% </p></div><div class="product-card__data"><p class="product-card__price"> £39.95 </p><p class="product-card__unit-price"> (£57.07 per litre) </p></div></a></li>

要获取图像 url,您必须调用 .get('src') 而不是 .get_src()

results[0].find('img', {'class':'product-card__image'}).get('src')

示例:

html='''
<li class="product-grid__item">
 <a class="product-card" href="/p/63818/bumbu-the-original-rum-glass-pack" onclick="_gaq.push(['_trackEvent', 'Products-GridView', 'click', '63818 : Bumbu The Original Rum / Glass Pack'])" title=" Bumbu The Original Rum Glass Pack">
  <div class="product-card__image-container">
   <img alt="Bumbu The Original Rum Glass Pack" class="product-card__image" height="4" loading="lazy" src="https://img.thewhiskyexchange.com/480/rum_bum4.jpg" width="3"/>
  </div>
  <div class="product-card__content">
   <p class="product-card__name">
    Bumbu The Original Rum
    <span class="product-card__name-secondary">
     Glass Pack
    </span>
   </p>
   <p class="product-card__meta">
    70cl / 40%
   </p>
  </div>
  <div class="product-card__data">
   <p class="product-card__price">
    £39.95
   </p>
   <p class="product-card__unit-price">
    (£57.07 per litre)
   </p>
  </div>
 </a>
</li>
'''

from bs4 import BeautifulSoup
soup=BeautifulSoup(html, "html.parser")
#print(soup.prettify())
print(soup.find('img', {'class':'product-card__image'}).get('src'))

输出:

https://img.thewhiskyexchange.com/480/rum_bum4.jpg