如何在乐天上找到评分计数的元素

Question

我无法在下面显示的 rakuten website 中找到评分（星星旁边的数字）。

我尝试使用 beautifulsoup 来定位元素，但它不起作用。

import time
import requests
!pip install beautifulsoup4
import bs4
!pip install lxml
from bs4 import BeautifulSoup
import pandas as pd

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'}

products =[]
for i in range(1,2): # Iterate from page 1 to the last page
    url = "https://www.rakuten.com.tw/shop/pandq/product/?l-id=tw_shop_inshop_cat&p={}".format(i)
    r = requests.get(url, headers = headers)
    soup = bs4.BeautifulSoup(r.text,"lxml")

    Soup = soup.find_all("div",class_='b-mod-item-vertical products-grid-section')

    for product in Soup:
        productcount = product.find_all("div",class_='b-content')
        print(productcount)

Answer 1

会发生什么？

元素选择不当，达不到预期效果

如何修复？

由于您的屏幕截图显示了不同的内容 price / rating 我将重点关注评分。

首先select所有项目：

soup.select('.b-item')

然后迭代结果集和select持有rating的<a>:

item.select_one('.product-review')

去掉所有特殊字符：

item.select_one('.product-review').get_text(strip=True).strip('(|)')

例子

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
r = requests.get('https://www.rakuten.com.tw/shop/pandq/product/?l-id=tw_shop_inshop_cat&p=1',headers=headers)
soup = BeautifulSoup(r.content, 'lxml')

for item in soup.select('.b-item'):
    rating = item.select_one('.product-review').get_text(strip=True).strip('(|)') if item.select_one('.product-review') else None
    print(rating)

输出

如何在乐天上找到评分计数的元素

How to find the element with rating count on rakuten

python

class

beautifulsoup

web-crawler

web-scraping

会发生什么？

如何修复？

例子

输出