Paramiko 缓冲区问题
Paramiko Buffer issue
我遇到了使用 paramiko 的缓冲区问题,我发现了同样的问题 here,其中一个解决方案指出:
Rather than using .get(), if you just call .open() to get an SFTPFile
instance, then call .read() on that object, or just hand it to the
Python standard library function shutil.copyfileobj() to download the
contents. That should avoid the Paramiko prefetch cache, and allow you
to download the file even if it's not quite as fast.
现在如果我有:
ssh=paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(host,username=user,password=pwd)
sftp=ssh.open_sftp()
sftp_file=sftp.open(remote_file_adress)
如何将这个类文件对象保存在我本地电脑的 csv 文件中? (原始文件也是 csv)
这是一个工作示例,可在您的本地计算机上获取测试文件的副本。该文件比 1 GIG 小得多,但给出了总体规划。
import paramiko
import os
import shutil
import time
import getpass
# get params
user = getpass.getuser()
pwd = getpass.getpass("Enter password: ")
bufsize = 2**20
host = 'localhost'
test_file_lines = 1000000
# create test file
now = time.asctime()
testfile_path = os.path.abspath('deleteme')
local_path = 'deleteme.copy'
print('writing test file...')
start = time.time()
with open(testfile_path, 'w') as fp:
for _ in range(test_file_lines):
fp.write(now + '\n')
delta = time.time() - start
file_size = os.stat(testfile_path).st_size
print("file size %d, %d KB/Sec" % (file_size, file_size/1024/delta))
# make connection
ssh=paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(host,username=user,password=pwd)
sftp=ssh.open_sftp()
sftp_file=sftp.open(testfile_path, bufsize=bufsize)
print('copying file...')
start = time.time()
shutil.copyfileobj(sftp_file,
open(local_path, 'wb', bufsize),
bufsize)
delta = time.time() - start
print('%.3f seconds, %d KB/Sec' % (delta, file_size/1024/delta))
#assert open(testfile_path).read() == open(local_path).read(), "files match"
运行 在我的机器上我得到了
Enter password:
writing test file...
file size 25000000, 21017 KB/Sec
copying file...
10.225 seconds, 2387 KB/Sec
我们预计速度会有所降低,因为存在读取和写入加上网络成本(它是本地主机,因此不会真正接触到线路),但这看起来确实有点慢。我正在使用具有 2 个内核的低功率笔记本电脑,在这个应用程序和 sshd 之间,使用了大部分 cpu,大概是为了进行加密。更高功率的机器可能工作得更好。
我遇到了使用 paramiko 的缓冲区问题,我发现了同样的问题 here,其中一个解决方案指出:
Rather than using .get(), if you just call .open() to get an SFTPFile instance, then call .read() on that object, or just hand it to the Python standard library function shutil.copyfileobj() to download the contents. That should avoid the Paramiko prefetch cache, and allow you to download the file even if it's not quite as fast.
现在如果我有:
ssh=paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(host,username=user,password=pwd)
sftp=ssh.open_sftp()
sftp_file=sftp.open(remote_file_adress)
如何将这个类文件对象保存在我本地电脑的 csv 文件中? (原始文件也是 csv)
这是一个工作示例,可在您的本地计算机上获取测试文件的副本。该文件比 1 GIG 小得多,但给出了总体规划。
import paramiko
import os
import shutil
import time
import getpass
# get params
user = getpass.getuser()
pwd = getpass.getpass("Enter password: ")
bufsize = 2**20
host = 'localhost'
test_file_lines = 1000000
# create test file
now = time.asctime()
testfile_path = os.path.abspath('deleteme')
local_path = 'deleteme.copy'
print('writing test file...')
start = time.time()
with open(testfile_path, 'w') as fp:
for _ in range(test_file_lines):
fp.write(now + '\n')
delta = time.time() - start
file_size = os.stat(testfile_path).st_size
print("file size %d, %d KB/Sec" % (file_size, file_size/1024/delta))
# make connection
ssh=paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(host,username=user,password=pwd)
sftp=ssh.open_sftp()
sftp_file=sftp.open(testfile_path, bufsize=bufsize)
print('copying file...')
start = time.time()
shutil.copyfileobj(sftp_file,
open(local_path, 'wb', bufsize),
bufsize)
delta = time.time() - start
print('%.3f seconds, %d KB/Sec' % (delta, file_size/1024/delta))
#assert open(testfile_path).read() == open(local_path).read(), "files match"
运行 在我的机器上我得到了
Enter password:
writing test file...
file size 25000000, 21017 KB/Sec
copying file...
10.225 seconds, 2387 KB/Sec
我们预计速度会有所降低,因为存在读取和写入加上网络成本(它是本地主机,因此不会真正接触到线路),但这看起来确实有点慢。我正在使用具有 2 个内核的低功率笔记本电脑,在这个应用程序和 sshd 之间,使用了大部分 cpu,大概是为了进行加密。更高功率的机器可能工作得更好。