BeautifulSoup 某些站点返回 403 错误
BeautifulSoup returning 403 error for some sites
我不明白为什么我会收到其中一些网站的 403 错误。
如果我手动访问 URL,页面加载正常。除了 403 响应之外没有任何错误消息,所以我不知道如何诊断问题。
from bs4 import BeautifulSoup
import requests
test_sites = [
'http://fashiontoast.com/',
'http://becauseimaddicted.net/',
'http://www.lefashion.com/',
'http://www.seaofshoes.com/',
]
for site in test_sites:
print(site)
#get page soure
response = requests.get(site)
print(response)
#print(response.text)
运行 上述代码的结果是...
http://fashiontoast.com/
Response [403]
http://becauseimaddicted.net/
Response [403]
http://www.lefashion.com/
Response [200]
http://www.seaofshoes.com/
Response [200]
谁能帮我了解问题的原因和解决方案?
有时页面会拒绝未识别 User-Agent 的 GET 请求。
使用浏览器访问页面 (Chrome)。右击然后 'Inspect'。复制 GET 请求的 User-Agent header(在“网络”选项卡中查看。
from bs4 import BeautifulSoup
import requests
with requests.Session() as se:
se.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
"Accept-Encoding": "gzip, deflate",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Language": "en"
}
test_sites = [
'http://fashiontoast.com/',
'http://becauseimaddicted.net/',
'http://www.lefashion.com/',
'http://www.seaofshoes.com/',
]
for site in test_sites:
print(site)
#get page soure
response = se.get(site)
print(response)
#print(response.text)
输出:
http://fashiontoast.com/
<Response [200]>
http://becauseimaddicted.net/
<Response [200]>
http://www.lefashion.com/
<Response [200]>
http://www.seaofshoes.com/
<Response [200]>
我不明白为什么我会收到其中一些网站的 403 错误。
如果我手动访问 URL,页面加载正常。除了 403 响应之外没有任何错误消息,所以我不知道如何诊断问题。
from bs4 import BeautifulSoup
import requests
test_sites = [
'http://fashiontoast.com/',
'http://becauseimaddicted.net/',
'http://www.lefashion.com/',
'http://www.seaofshoes.com/',
]
for site in test_sites:
print(site)
#get page soure
response = requests.get(site)
print(response)
#print(response.text)
运行 上述代码的结果是...
http://fashiontoast.com/
Response [403]
http://becauseimaddicted.net/
Response [403]
http://www.lefashion.com/
Response [200]
http://www.seaofshoes.com/
Response [200]
谁能帮我了解问题的原因和解决方案?
有时页面会拒绝未识别 User-Agent 的 GET 请求。
使用浏览器访问页面 (Chrome)。右击然后 'Inspect'。复制 GET 请求的 User-Agent header(在“网络”选项卡中查看。
from bs4 import BeautifulSoup
import requests
with requests.Session() as se:
se.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
"Accept-Encoding": "gzip, deflate",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Language": "en"
}
test_sites = [
'http://fashiontoast.com/',
'http://becauseimaddicted.net/',
'http://www.lefashion.com/',
'http://www.seaofshoes.com/',
]
for site in test_sites:
print(site)
#get page soure
response = se.get(site)
print(response)
#print(response.text)
输出:
http://fashiontoast.com/
<Response [200]>
http://becauseimaddicted.net/
<Response [200]>
http://www.lefashion.com/
<Response [200]>
http://www.seaofshoes.com/
<Response [200]>