抓取 - 无法识别产品 class

Scraping - Cannot identify product class

大家下午好,

一直在尝试为这个特定页面开发一个抓取工具。

我正在尝试提取产品名称和价格。

代码如下

from bs4 import BeautifulSoup
import requests
import pandas as pd
import urllib.parse

website = 'https://www.thewhiskyexchange.com/c/339/rum'
response = requests.get(website)
response.status_code
soup = BeautifulSoup(response.content, 'html.parser')
results = soup.find_all('li',{'product-grid__item'})

如果我执行“len(results)”,我将得到 24 的结果。

但是当实际调用结果(结果[0])时,我只返回了 1 个项目。

<li class="product-grid__item"><a class="product-card" href="/p/63818/bumbu-the-original-rum-glass-pack" onclick="_gaq.push(['_trackEvent', 'Products-GridView', 'click', '63818 : Bumbu The Original Rum / Glass Pack'])" title=" Bumbu The Original Rum Glass Pack"><div class="product-card__image-container"><img alt="Bumbu The Original Rum Glass Pack" class="product-card__image" height="4" loading="lazy" src="https://img.thewhiskyexchange.com/480/rum_bum4.jpg" width="3"/></div><div class="product-card__content"><p class="product-card__name"> Bumbu The Original Rum<span class="product-card__name-secondary">Glass Pack</span></p><p class="product-card__meta"> 70cl / 40% </p></div><div class="product-card__data"><p class="product-card__price"> £39.95 </p><p class="product-card__unit-price"> (£57.07 per litre) </p></div></a></li>

我的问题是:我看对了吗class。我尝试了其他 classes,但它似乎也不起作用。还是代码有问题?

(我应该说我正在努力自学如何编码,所以如果有什么遗漏也不会感到惊讶)

一切正常。 results 实际上是一个 list data-type 变量(这意味着这个搜索 soup.find_all('li',{'product-grid__item'}) 有很多结果),所以这样做 results[0] 你首先访问列表的元素。您可以这样做:print(results) 以查看 results 中的所有元素或使用 for 循环:

for result in results:
  print(result) 

产品标题紧跟在 [class="product-card__name"] 那是文本节点之后。因此,要获取文本节点值,您可以调用 .find(text=True) method.The 同样的方法是获取 price.Now,它正在工作

from bs4 import BeautifulSoup
import requests
import pandas as pd
import urllib.parse

website = 'https://www.thewhiskyexchange.com/c/339/rum'
response = requests.get(website)
response.status_code
soup = BeautifulSoup(response.content, 'html.parser')
results = soup.find_all('li',{'product-grid__item'})

for result in results:
    title = result.select_one('.product-card__name').find(text=True)
    print(title)
    try:
        price = result.select_one('.product-card__unit-price').find(text=True).replace('(','').replace(')','')
        print(price)
    except:
        pass

输出:

Bumbu The Original Rum
 £57.07 per litre 
 Kraken Black Spiced
 £54.64 per litre
 Kraken Black Roast Coffee Rum
 £38.21 per litre
 Doorly's 14 Year Old Rum
 £87.79 per litre
 Admiral Vernon's Old J Spiced Tiki Fire Rum
 £59.93 per litre
 Ron Zacapa Centenario Sistema Solera 23 Rum
 £78.50 per litre
 Old Monk 7 Year Old Rum
 £35.64 per litre 
 Diplomatico Reserva Exclusiva Rum
 £64.21 per litre
 Pusser's Select Aged 151 Navy Rum
 £69.93 per litre
 Diplomatico Reserva Exclusiva Rum
 £58.50 per litre
 El Dorado Rum 15 Year Old
 £78.50 per litre
 Plantation Extra Old Barbados Rum
 £77.50 per litre
 Captain Morgan Black Spiced
 Doorly's XO Rum
 £53.50 per litre 
 Mount Gay XO Triple Cask Blend
 £76.79 per litre
 Diplomatico Reserva Exclusiva Rum
 £58.50 per litre
 Plantation Barbados 5 Year Old Signature Blend Rum
 £44.64 per litre
 Worthy Park Single Estate Reserve
 £69.93 per litre
 Pusser's Blue Label British Navy Rum
 £39.93 per litre
 Ron Zacapa Centenario XO Rum Solera Gran Reserva Especial
 £150 per litre
 Havana Club 3 Year Old Rum
 £30.64 per litre 
 Santa Teresa 1796 Rum
 £74.93 per litre
 Eminente Reserva 7 Year Old
 £64.93 per litre
 Bumbu The Original Rum
 £48.21 per litre