如何只下载前 x 个字节的数据 Python

Question

情况：要下载的文件是一个大文件（>100MB）。这需要相当长的时间，尤其是在网速较慢的情况下。

问题：但是，我只需要文件头（前512字节），它将决定是否需要下载整个文件。

问题：有没有办法只下载文件的前 512 个字节？

附加信息：目前下载是使用 Python2 中的 urllib.urlretrieve 完成的。7

Answer 1

我认为 curl 和 head 比这里的 Python 解决方案更好：

curl https://my.website.com/file.txt | head -c 512 > header.txt

编辑：另外，如果你绝对必须在 Python 脚本中使用它，你可以使用 subprocess 来执行 curl 管道到 head 命令执行

编辑 2：对于完全 Python 解决方案：urlopen 函数（Python 2 中的 urllib2.urlopen 和 [=30= 中的 urllib.request.urlopen ] 3) returns 一个类似文件的流，您可以在其上使用 read 函数，它允许您指定字节数。例如，urllib2.urlopen(my_url).read(512) 将 return my_url

的前 512 个字节

Answer 2

如果您尝试读取的 url 响应为 Content-Length header，那么您可以在 Python 2 中使用 urllib2 获取文件大小.

def get_file_size(url):
    request = urllib2.Request(url)
    request.get_method = lambda : 'HEAD'
    response = urllib2.urlopen(request)
    length = response.headers.getheader("Content-Length")
    return int(length)

可以调用该函数获取长度并与某个阈值进行比较来决定是否下载。

if get_file_size("http://whosebug.com") < 1000000:
    # Download

（请注意 Python 3 实现略有不同:)

from urllib import request

def get_file_size(url):
    r = request.Request(url)
    r.get_method = lambda : 'HEAD'
    response = request.urlopen(r)
    length = response.getheader("Content-Length")
    return int(length)

如何只下载前 x 个字节的数据 Python

How to Download only the first x bytes of data Python

python

urllib

download

python-2.7

urlretrieve