如何使用服务器延迟一段时间后发送的 python 下载文件？

Question

我必须从本地服务器下载大量文件。在浏览器 [Firefox] 中打开 URL 时，页面打开内容为 "File being generated.. Wait.."，然后弹出窗口显示用于保存所需 .xlsx 文件的选项。

我尝试使用 urllib 保存页面对象，但它保存了内容为 "File being generated.. Wait.." 的 .html 文件。我使用了此处描述的代码（使用 urllib2）：

我不知道如何下载服务器稍后发送的文件。它在浏览器中运行良好。如何使用 python?

模拟它

Answer 1

是不是就这么简单

import urllib2
import time

response = urllib2.urlopen('http://www.example.com/')
time.sleep(10)  # Or however long you need.
html = response.read()

Answer 2

首先，您必须知道生成文档的确切 URL 位置。您可以使用 firefox 和插件 Http Live Headers。

然后使用python来"simulate"相同的请求。

希望对你有所帮助。

PD: 或分享本站的url，我可以更好地帮助你。

Answer 3

import requests 
url = 'https://readthedocs.org/projects/python-guide/downloads/pdf/latest/'
myfile = requests.get(url, allow_redirects=True)
open('c:/example.pdf', 'wb').write(myfile.content)

有点老，但遇到了同样的问题。解决的关键在allow_redirects=True.

如何使用服务器延迟一段时间后发送的 python 下载文件？

How to download a file using python that is sent after some delay by server?

python

file

urllib

download

urllib2