抓取 cdkeys[dot]com

Question

我正在尝试获取特定产品的价格变化，但使用 Selenium 或 Beautifulsoup 得到的结果不一。

from selenium import webdriver
from bs4 import BeautifulSoup
import re
driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.get('https://www[.]cdkeys.com[/]playstation-network-psn[/]playstation-plus[/]1-year-playstation-plus-membership-ps3-ps4-ps-vita-digital-code')
search = driver.find_element_by_xpath('.//span[@class="price"]')

soup = BeautifulSoup(driver.page_source,'html.parser')
price = soup.find_all('span',{'class':['price']})

search returns 某种类型的对象，但文本为空属性.

price 返回了几十个结果，包括我感兴趣的结果。我相信它背后有某种 API 但我无法使用按 XHR 排序的开发工具找到它。

Answer 1

你应该在使用 Selenium 获取元素之前添加一个等待，让元素完全加载。

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import re
driver = webdriver.Chrome(executable_path='chromedriver.exe')
wait = WebDriverWait(driver, 20)
driver.get('https://www[.]cdkeys.com[/]playstation-network-psn[/]playstation-plus[/]1-year-playstation-plus-membership-ps3-ps4-ps-vita-digital-code')
search = wait.until(EC.presence_of_element_located((By.XPATH, './/span[@class="price"]')))
time.sleep(0.5)
prices_in_usd = driver.find_elements_by_xpath("//span[@class='price' and contains(text(),'$')]")


soup = BeautifulSoup(driver.page_source,'html.parser')
price = soup.find_all('span',{'class':['price']})

如果您想获取元素文本，请不要忘记提取该文本

search_text = search.text

Answer 2

首先，你不需要在
中结合美丽的汤和硒那种情况，其中一个足以完成整个工作。

我会选择beautifulSoup（有要求）
这样做的原因—— 此抓取不需要 javascript beautifulSoup 在性能方面比 selenium 轻得多。

关于抓取方法-
你得到了几十个结果，因为你只通过他的 class 名字搜索元素，而且这个名字中有很多元素。
解决方案之一是组合多个
属性来找到正确的元素，就像我在下面的代码中所做的那样。

from bs4 import BeautifulSoup as BS
import requests
url = "https://www.cdkeys.com/playstation-network-psn/playstation-plus/1-year-playstation-plus-membership-ps3-ps4-ps-vita-digital-code?mw_aref=xcalibur"
r = requests.get(url)
soup = BS(r.text, features='html.parser')
product_main = soup.find('div', {'class': 'product-info-main'})
product_price = product_main.find('span', {'data-price-type': 'finalPrice', 'class': 'price-wrapper'})
print(product_price.text)

抓取 cdkeys[dot]com

Scraping cdkeys[dot]com

beautifulsoup

web-scraping

python-3.x

selenium-chromedriver