Python

Question

我有一个大文件 (500 Mb-1Gb) 存储在 HTTP(S) 位置
（比如 https://example.com/largefile.zip）。

我可以 read/write 访问 FTP 服务器

我有普通用户权限（没有 sudo）。

在这些限制下，我想通过请求从 HTTP URL 读取文件并将其发送到 FTP 服务器，而无需先写入磁盘。

所以通常情况下，我会这样做。

response=requests.get('https://example.com/largefile.zip', stream=True)
with open("largefile_local.zip", "wb") as handle:                                                                                                     
 for data in response.iter_content(chunk_size=4096):
  handle.write(data)

然后上传本地文件到FTP。但是我想避开磁盘I/O。我无法将 FTP 挂载为 fuse 文件系统，因为我没有超级用户权限。

理想情况下，我会做类似 ftp_file.write() 而不是 handle.write() 的事情。那可能吗？ ftplib 文档似乎假定只会上传本地文件，而不是 response.content。所以理想情况下我想做

response=requests.get('https://example.com/largefile.zip', stream=True)
for data in response.iter_content(chunk_size=4096):
 ftp_send_chunk(data)

我不知道怎么写 ftp_send_chunk()。

这里有一个类似的问题 ()。我的用例需要从 HTTP URL 检索一个块并将其写入 FTP.

P.S.: 答案中提供的解决方案（urllib.urlopen 周围的包装）也适用于保管箱上传。我在使用我的 ftp 提供商时遇到问题，所以最终使用了 dropbox，它工作可靠。

请注意，Dropbox 在 api 中有一个 "add web upload" 功能，可以做同样的事情（远程上传）。这仅适用于 "direct" 链接。在我的用例中，http_url 来自 i.p 的流媒体服务。受限制的。因此，此解决方法变得必要。这是代码

import dropbox;
d = dropbox.Dropbox(<ACTION-TOKEN>);
f=FileWithProgress(filehandle);
filesize=filehandle.length;
targetfile='/'+fname;
CHUNK_SIZE=4*1024*1024
upload_session_start_result = d.files_upload_session_start(f.read(CHUNK_SIZE));
num_chunks=1
cursor = dropbox.files.UploadSessionCursor(session_id=upload_session_start_result.session_id,
                                           offset=CHUNK_SIZE*num_chunks)
commit = dropbox.files.CommitInfo(path=targetfile)
while CHUNK_SIZE*num_chunks < filesize:
 if ((filesize - (CHUNK_SIZE*num_chunks)) <= CHUNK_SIZE):
  print d.files_upload_session_finish(f.read(CHUNK_SIZE),cursor,commit)
 else:
  d.files_upload_session_append(f.read(CHUNK_SIZE),cursor.session_id,cursor.offset)
 num_chunks+=1
cursor.offset = CHUNK_SIZE*num_chunks
link = d.sharing_create_shared_link(targetfile)  
url = link.url
dl_url = re.sub(r"\?dl\=0", "?dl=1", url)
dl_url = dl_url.strip()
print 'dropbox_url: ',dl_url;

我认为甚至可以通过 google-drive 通过他们的 python api 来做到这一点，但是使用凭据和他们的 python 包装器太难了为了我。检查这个1 and this2

Answer 1

urllib.request.urlopen, as it returns a file-like object, which you can use directly with FTP.storbinary应该很容易。

ftp = FTP(host, user, passwd)

filehandle = urllib.request.urlopen(http_url)

ftp.storbinary("STOR /ftp/path/file.dat", filehandle)

如果你想监控进度，实现一个类似包装文件的对象，它将委托对 filehandle 对象的调用，但也会显示进度：

class FileWithProgress:

    def __init__(self, filehandle):
        self.filehandle = filehandle
        self.p = 0

    def read(self, blocksize):
        r = self.filehandle.read(blocksize)
        self.p += len(r)
        print(str(self.p) + " of " + str(self.p + self.filehandle.length)) 
        return r

filehandle = urllib.request.urlopen(http_url)

ftp.storbinary("STOR /ftp/path/file.dat", FileWithProgress(filehandle))

对于 Python 2 使用：

urllib.urlopen，而不是 urllib.request.urlopen.
filehandle.info().getheader('Content-Length') 而不是 str(self.p + filehandle.length)

Python - 将文件从 HTTP(S) URL 传输到 FTP/Dropbox 而不写入磁盘（分块上传）

Python - Transfer a file from HTTP(S) URL to FTP/Dropbox without disk writing (chunked upload)

ftp

dropbox

dropbox-api

python-requests