使用 Python 不使用 selenium 下载文件，例如 Chrome 的 "Save Link As"

Question

有一个 web page where I can download zip files using "Save Link As" option in chrome but when I copy link address 并在浏览器中打开它 returns 403/forbidden。我尝试使用请求库保存文件，但它也被禁止响应。

我不知道chrome怎么下载，但是我不能用requests库下载。

如何在不使用 selenium 网络驱动程序的情况下下载文件，因为这对于这个简单的任务来说太过分了？

Answer 1

我建议为此使用请求。下面的简单示例，第一个文件已填写：

url = 'https://www.nseindia.com/content/historical/EQUITIES/2003/DEC/cm01DEC2003bhav.csv.zip'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36', 'Referer': 'https://www.nseindia.com/'}
r = requests.get(url, allow_redirects=True, headers=headers)
open('cm01DEC2003bhav.csv.zip', 'wb').write(r.content)

网站检查 header 中的 referer，如果 referer 与网站本身不匹配，则拒绝请求。

Answer 2

将 urllib.request.urlretrieve 与自定义 Referer header 一起使用，例如 @Douglas 指定的：

>>> import urllib.request
>>> opener = urllib.request.build_opener()
>>> opener.addheaders = [('Referer', 'https://www.nseindia.com/')]
>>> urllib.request.install_opener(opener)
>>> source = 'https://www.nseindia.com/content/historical/EQUITIES/2001/JAN/cm01JAN2001bhav.csv.zip'
>>> destination = 'destination.csv.zip'  # Path to destination.
>>> urllib.request.urlretrieve(source, destination)
('destination.csv.zip', <http.client.HTTPMessage object at 0x10ce20208>)

这会将您的文件下载到指定的文件路径。

使用 Python 不使用 selenium 下载文件，例如 Chrome 的 "Save Link As"

Download a file using Python without selenium like Chrome's "Save Link As"

python

selenium

google-chrome

urllib2

python-requests