Beautifulsoup 不适用于所有网址

Question

错误说：

AttributeError: 'NoneType' object has no attribute 'get_text'

我正在学习网络抓取教程，一切正常 this url, when I wanted to change it to this url 我已经提到的错误出现了。

爬虫功能：

def product_crawler():
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.content, 'html.parser')
    title = soup.find(id="productTitle").get_text()
    print(title)

我检查了 Whosebug 上的所有答案，例如将 html.parser 更改为 lxml ，但没有一个有效。

Answer 1

尝试添加 Accept-Language HTTP header:

import requests
from bs4 import BeautifulSoup

url = "https://www.amazon.com/dp/B08DK5ZH44"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0",
    "Accept-Language": "en-US,en;q=0.5",
}

page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "html.parser")
title = soup.find(id="productTitle").get_text(strip=True)
print(title)

打印：

GoPro HERO9 Black - Waterproof Action Camera with Front LCD and Touch Rear Screens, 5K Ultra HD Video, 20MP Photos, 1080p Live Streaming, Webcam, Stabilization

Beautifulsoup 不适用于所有网址

Beautifulsoup doesn't work properly with all urls

beautifulsoup

attributeerror

web-scraping

python-3.x