Beautifulsoup 错误 404 仅在网站内的某些网址中

Error 404 with Beautifulsoup only in some urls within a site

我一直在学习使用 python 和 beautifulsoup 进行抓取,但我最近 运行 在请求站点内的第二页结果时遇到了问题。

使用此代码请求第一页工作正常:

url = "https://PAGE_1_URL_HERE"
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
headers = {'User-Agent': user_agent}
response = requests.get(url, headers=headers)
html = response.content
soup = BeautifulSoup(html, features="html.parser")

print(response)

但是在第二页尝试使用相同的代码 returns 404。

url = "https://PAGE_2_URL_HERE"
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
headers = {'User-Agent': user_agent}
response = requests.get(url, headers=headers)
html = response.content
soup = BeautifulSoup(html, features="html.parser")

print(response)

我尝试了不同的 headers 但我无法解决这个问题,如果有人知道解决方案,我将不胜感激。

改为使用 https:

OLD: http://PAGE_2_URL_HERE

NEW: https://PAGE_2_URL_HERE&noscript=false

这里是一个例子,你只需要添加你的 cookie 浏览器...

from bs4 import BeautifulSoup
import requests


url = "https://PAGE_2_URL_HERE"
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.87 Safari/537.36'
headers = {'User-Agent': user_agent}
cookies = {"cookie":"COPY_HERE_YOUR_COOKIE_FROM_BROWSER"}
response = requests.get(url, headers=headers , cookies=cookies)
#print(response.text)
print(response)
html = response.content
soup = BeautifulSoup(html, features="html.parser")
print(response)