Beautifulsoup 不适用于所有网址
Beautifulsoup doesn't work properly with all urls
错误说:
AttributeError: 'NoneType' object has no attribute 'get_text'
我正在学习网络抓取教程,一切正常 this url, when I wanted to change it to this url 我已经提到的错误出现了。
爬虫功能:
def product_crawler():
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find(id="productTitle").get_text()
print(title)
我检查了 Whosebug 上的所有答案,例如将 html.parser 更改为 lxml ,但没有一个有效。
尝试添加 Accept-Language
HTTP header:
import requests
from bs4 import BeautifulSoup
url = "https://www.amazon.com/dp/B08DK5ZH44"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0",
"Accept-Language": "en-US,en;q=0.5",
}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "html.parser")
title = soup.find(id="productTitle").get_text(strip=True)
print(title)
打印:
GoPro HERO9 Black - Waterproof Action Camera with Front LCD and Touch Rear Screens, 5K Ultra HD Video, 20MP Photos, 1080p Live Streaming, Webcam, Stabilization
错误说:
AttributeError: 'NoneType' object has no attribute 'get_text'
我正在学习网络抓取教程,一切正常 this url, when I wanted to change it to this url 我已经提到的错误出现了。
爬虫功能:
def product_crawler():
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
title = soup.find(id="productTitle").get_text()
print(title)
我检查了 Whosebug 上的所有答案,例如将 html.parser 更改为 lxml ,但没有一个有效。
尝试添加 Accept-Language
HTTP header:
import requests
from bs4 import BeautifulSoup
url = "https://www.amazon.com/dp/B08DK5ZH44"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0",
"Accept-Language": "en-US,en;q=0.5",
}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, "html.parser")
title = soup.find(id="productTitle").get_text(strip=True)
print(title)
打印:
GoPro HERO9 Black - Waterproof Action Camera with Front LCD and Touch Rear Screens, 5K Ultra HD Video, 20MP Photos, 1080p Live Streaming, Webcam, Stabilization