User-agent 网络抓取错误 python3

Question

这是我第一次使用网络抓取。当我使用 page = requests.get(URL) 时它工作得很好但是当我添加

headers = {"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15'}

page = requests.get(URL, headers=headers)

我收到一个错误

    title = soup.find(id="productTitle").get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'

这有什么问题吗？我应该和 headers 一起辞职吗？

Answer 1

我认为该页面包含无效 HTML，因此 BeatifulSoup 无法找到您的元素。

首先尝试美化HTML：

import requests
from bs4 import BeautifulSoup

URL = 'https://www.amazon.com/dp/B07JP9QJ15/ref=dp_cerb_1'
headers = {
    "User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15'}
page = requests.get(URL, headers=headers)

pretty = BeautifulSoup(page.text,'html.parser').prettify()
soup = BeautifulSoup(pretty,'html.parser')
print(soup.find(id='productTitle').get_text())

哪个returns：

Dell UltraSharp U2719D - LED 显示器 - 27"

User-agent 网络抓取错误 python3

User-agent error with web scraping python3

python

user-agent

web-scraping

python-3.x