使用 BeautifulSoup 我找不到一些元素

Question

image from the website 我正在尝试使用运行这个脚本

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'

}
r = requests.get('https://www.sneaksup.com/search?q=dunk&pagenumber=1', headers=headers)
soup = BeautifulSoup(r.content, 'lxml')
hrefs = soup.find('div', class_='product-list-inner-container bg-white')
print(hrefs)

但不幸的是我得到的只是 this 。如何获取“col-12 col-md-3 product-grid-item-container rendered-enhanced”上方的所有信息（我试图从以下位置找到：

hrefs = soup.find('div', class_='col-12 col-md-3 product-grid-item-container rendered-enhanced')

但只得到了 [ ] )

Answer 1

您正在寻找的数据可能无法通过 class 看到，但您可以手动搜索产品标题，以便您可以在脚本标签中找到它

text=soup.find_all("script")[5].contents[0]

使用上面的代码后，我们可以使用 re 模块提取文本

import re
main_data=re.findall(r'\{.*?\}', text)

其中 main_data return 作为字典列表，你可以提取任何你想要的数据

Answer 2

URL 从 API 端点加载数据。

您可以从该端点获取与产品相关的所有内容。

这里是终点

https://www.sneaksup.com/search?q=dunk&pagenumber=1&paginationType=20&orderby=0

Answer 3

如前所述，您可以从 api 端点获取该数据。以下是您的操作方式：

import requests
import pandas as pd

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'

}

url = 'https://www.sneaksup.com/search?q=dunk&pagenumber=1&paginationType=20&orderby=0'
jsonData = requests.get(url, headers=headers).json()

df = pd.DataFrame(jsonData['Products'])

使用 BeautifulSoup 我找不到一些元素

With BeautifulSoup I can't find some elements

html

element

beautifulsoup