Python 3 urllib HTTP 错误 412:前提条件失败
Python 3 urllib HTTP Error 412: Precondition Failed
我正在尝试解析网站的 HTML 数据。我写了这段代码:
import urllib.request
def parse(url):
response = urllib.request.urlopen(url)
html = response.read()
strHTML = html.decode()
return strHTML
website = "http://www.manarat.ac.bd/"
string = parse(website)
但显示此错误:
Traceback (most recent call last):
File "C:\Users\pupewekate\Videos\RAW.py", line 11, in
string = parse(website)
File "C:\Users\pupewekate\Videos\RAW.py", line 5, in parse
response = urllib.request.urlopen(url)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py",
line 223, in urlopen return opener.open(url, data, timeout)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py",
line 532, in open response = meth(req, response)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py",
line 642, in http_response 'http', request, response, code, msg,
hdrs)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py",
line 570, in error return > self._call_chain(*args)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py",
line 504, in _call_chain result = func(*args)
File
"C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py",
line 650, in http_error_default raise HTTPError(req.full_url, code,
msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 412: Precondition
Failed
有什么解决办法吗?
本网站检查用户代理header。如果它无法识别它的值,它 returns 状态代码 412:
import requests
print(requests.get('http://www.manarat.ac.bd/'))
# <Response [412]>
print(requests.get('http://www.manarat.ac.bd/', headers={'User-Agent': 'Chrome'}))
# <Response [200]>
有关如何在 urlib 中设置用户代理,请参阅 this answer。
你可以使用requests模块,因为它更容易实现,否则如果你确定要使用urllib,你可以使用这个:
import urllib
def parse(url):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
response = urllib.request.urlopen(url,headers=headers)
print response
website = "http://www.manarat.ac.bd/"
string = parse(website)
我正在尝试解析网站的 HTML 数据。我写了这段代码:
import urllib.request
def parse(url):
response = urllib.request.urlopen(url)
html = response.read()
strHTML = html.decode()
return strHTML
website = "http://www.manarat.ac.bd/"
string = parse(website)
但显示此错误:
Traceback (most recent call last): File "C:\Users\pupewekate\Videos\RAW.py", line 11, in
string = parse(website) File "C:\Users\pupewekate\Videos\RAW.py", line 5, in parse
response = urllib.request.urlopen(url) File "C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 223, in urlopen return opener.open(url, data, timeout) File "C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 532, in open response = meth(req, response) File "C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 642, in http_response 'http', request, response, code, msg, hdrs) File "C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 570, in error return > self._call_chain(*args) File "C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 504, in _call_chain result = func(*args) File "C:\Users\pupewekate\AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 650, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 412: Precondition Failed
有什么解决办法吗?
本网站检查用户代理header。如果它无法识别它的值,它 returns 状态代码 412:
import requests
print(requests.get('http://www.manarat.ac.bd/'))
# <Response [412]>
print(requests.get('http://www.manarat.ac.bd/', headers={'User-Agent': 'Chrome'}))
# <Response [200]>
有关如何在 urlib 中设置用户代理,请参阅 this answer。
你可以使用requests模块,因为它更容易实现,否则如果你确定要使用urllib,你可以使用这个:
import urllib
def parse(url):
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3;Win64;x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'}
response = urllib.request.urlopen(url,headers=headers)
print response
website = "http://www.manarat.ac.bd/"
string = parse(website)