User-agent 网络抓取错误 python3
User-agent error with web scraping python3
这是我第一次使用网络抓取。当我使用 page = requests.get(URL)
时它工作得很好但是当我添加
headers = {"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15'}
page = requests.get(URL, headers=headers)
我收到一个错误
title = soup.find(id="productTitle").get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'
这有什么问题吗?我应该和 headers 一起辞职吗?
我认为该页面包含无效 HTML,因此 BeatifulSoup 无法找到您的元素。
首先尝试美化HTML:
import requests
from bs4 import BeautifulSoup
URL = 'https://www.amazon.com/dp/B07JP9QJ15/ref=dp_cerb_1'
headers = {
"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15'}
page = requests.get(URL, headers=headers)
pretty = BeautifulSoup(page.text,'html.parser').prettify()
soup = BeautifulSoup(pretty,'html.parser')
print(soup.find(id='productTitle').get_text())
哪个returns:
Dell UltraSharp U2719D - LED 显示器 - 27"
这是我第一次使用网络抓取。当我使用 page = requests.get(URL)
时它工作得很好但是当我添加
headers = {"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15'}
page = requests.get(URL, headers=headers)
我收到一个错误
title = soup.find(id="productTitle").get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'
这有什么问题吗?我应该和 headers 一起辞职吗?
我认为该页面包含无效 HTML,因此 BeatifulSoup 无法找到您的元素。
首先尝试美化HTML:
import requests
from bs4 import BeautifulSoup
URL = 'https://www.amazon.com/dp/B07JP9QJ15/ref=dp_cerb_1'
headers = {
"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15'}
page = requests.get(URL, headers=headers)
pretty = BeautifulSoup(page.text,'html.parser').prettify()
soup = BeautifulSoup(pretty,'html.parser')
print(soup.find(id='productTitle').get_text())
哪个returns:
Dell UltraSharp U2719D - LED 显示器 - 27"