Python 3.5 urllib.request 403 禁止错误
Python 3.5 urllib.request 403 Forbidden Error
import urllib.request
import urllib
from bs4 import BeautifulSoup
url = "https://www.brightscope.com/ratings"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "html.parser")
print(soup.title)
我试图访问上述站点,但代码一直显示 403 禁止错误。
有什么想法吗?
C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\python.exe "C:/Users/jerem/PycharmProjects/webscraper/url scraper.py"
Traceback (most recent call last):
File "C:/Users/jerem/PycharmProjects/webscraper/url scraper.py", line 7, in
page = urllib.request.urlopen(url)
File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 472, in open
response = meth(req, response)
File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 510, in error
return self._call_chain(*args)
File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 444, in _call_chain
result = func(*args)
File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
import requests
from bs4 import BeautifulSoup
url = "https://www.brightscope.com/ratings"
headers = {'User-Agent':'Mozilla/5.0'}
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
print(soup.title)
输出:
<title>BrightScope Ratings</title>
首先,使用 requests
而不是 urllib
。
然后,将headers
添加到requests
,如果不添加,站点将禁止您,因为默认User-Agent
是站点不喜欢的爬虫。
import urllib.request
import urllib
from bs4 import BeautifulSoup
url = "https://www.brightscope.com/ratings"
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, "html.parser")
print(soup.title)
我试图访问上述站点,但代码一直显示 403 禁止错误。
有什么想法吗?
C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\python.exe "C:/Users/jerem/PycharmProjects/webscraper/url scraper.py" Traceback (most recent call last): File "C:/Users/jerem/PycharmProjects/webscraper/url scraper.py", line 7, in page = urllib.request.urlopen(url) File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 163, in urlopen return opener.open(url, data, timeout) File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 472, in open response = meth(req, response) File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 582, in http_response 'http', request, response, code, msg, hdrs) File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 510, in error return self._call_chain(*args) File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 444, in _call_chain result = func(*args) File "C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 590, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden
import requests
from bs4 import BeautifulSoup
url = "https://www.brightscope.com/ratings"
headers = {'User-Agent':'Mozilla/5.0'}
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
print(soup.title)
输出:
<title>BrightScope Ratings</title>
首先,使用 requests
而不是 urllib
。
然后,将headers
添加到requests
,如果不添加,站点将禁止您,因为默认User-Agent
是站点不喜欢的爬虫。