BeautifulSoup 'find()' returns 无类型值

Question

我刚刚开始尝试使用 Python 编写价格跟踪器代码，并且已经运行遇到一个我不明白的错误。这是代码：

from bs4 import BeautifulSoup

URL = 'https://www.amazon.com/Corsair-Platinum-Mechanical-Keyboard-Backlit/dp/B082GR814B/'
HEADERS = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0."
                         "4103.116 Safari/537.36"}
targetPrice = 150


def getPrice():
    page = requests.get(URL, headers=HEADERS)
    soup = BeautifulSoup(page.content, 'html.parser')
    price = soup.find(id="priceblock_ourprice").get_text()    # Error happens here
    print(price)


if True:
    getPrice()

我看到这部分 soup.find(id="priceblock_ourprice") returns 'None' 的值因此是 AttributeError。我不明白为什么它 returns 是 'None' 值。只有一次代码真正起作用并打印了产品价格，而且再也没有发生过。我运行在一次成功尝试后再次执行脚本，没有更改任何内容，并再次得到 AttributeError。

我也试过以下方法：

使用 html5lib 和 lxml 而不是 html.parser。不同的 ID，看看我是否可以访问网站的不同部分。其他用户代理。我还从 github 下载了一个类似的程序，它使用完全相同的代码来查看它是否会运行，但它也没有。

这里发生了什么？任何帮助将不胜感激。

Answer 1

尝试在 soup = BeautifulSoup(page.content, 'html.parser') 之后打印 soup。

亚马逊知道您正在尝试抓取它们，因此您认为它们返回的页面是错误的。

Getting blocked when scraping Amazon (even with headers, proxies, delay)

Answer 2

您正在获取验证码页面。尝试在浏览器中设置更多 HTTP headers 以获得正确的页面。当我设置 Accept-Language http header 时，我无法再重现错误：

import requests
from bs4 import BeautifulSoup


URL = 'https://www.amazon.com/Corsair-Platinum-Mechanical-Keyboard-Backlit/dp/B082GR814B/'
HEADERS = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0",
    'Accept-Language': 'en-US,en;q=0.5',
}

def getPrice():
    page = requests.get(URL, headers=HEADERS)
    soup = BeautifulSoup(page.content, 'html.parser')
    price = soup.find(id="priceblock_ourprice").get_text()
    print(price)


getPrice()

打印：

5.99

BeautifulSoup 'find()' returns 无类型值

BeautifulSoup 'find()' returns NoneType Value

python

web-scraping

beautifulsoup

attributeerror

nonetype