Python 3.5 无法打开 url- 错误 (http 403)

Python 3.5 can not open a url- error (http 403)

我正在尝试打开并解析 Python 3.5 中的以下 URL,以便为我的作业收集一些评论。这是我的代码:

from urllib.request import Request, urlopen
req = Request ("http://www.webmd.com/drugs/drugreview-35-Zoloft+oral.aspx?drugid=35&drugname=Zoloft+oral&conditionFilter=-500")    
home_page = urlopen(req).read()
print (home_page)

这是错误:

 Traceback (most recent call last):
      File "/Users/maryamzolnoori/Dropbox/Dissertation/Programming/Web-Crawl/Askapatient_collect_comments.py", line 12, in <module>
        home_page = urlopen(req).read()
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in urlopen
        return opener.open(url, data, timeout)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 472, in open
        response = meth(req, response)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 582, in http_response
        'http', request, response, code, msg, hdrs)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 510, in error
        return self._call_chain(*args)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
        result = func(*args)
      File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 590, in http_error_default
        raise HTTPError(req.full_url, code, msg, hdrs, fp)
    urllib.error.HTTPError: HTTP Error 403: Forbidden

我什至在 python 2.7 中测试过它,但它失败了。错误是:

urllib2.HTTPError: HTTP Error 416: Requested Range Not Satisfiable

您收到 403 禁止,很可能是因为用户代理 python。尝试将用户代理设置为浏览器。

例如:

from urllib.request import Request, urlopen
url = "http://www.webmd.com/drugs/drugreview-35-Zoloft+oral.aspx?drugid=35&drugname=Zoloft+oral&conditionFilter=-500"
req = Request(
    url, 
    data=None, 
    headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
    }
)

home_page = urlopen(req)
print(home_page.read().decode('utf-8'))

使用适当的编码也是一个好主意。