PySFTP 和 get_r 使用 Python - "No such file or directory"
PySFTP and get_r using Python - "No such file or directory"
所以我有一个 "simple" 进程需要出去从另一个服务器抓取数据,然后将目录(和所有子目录)复制到我的服务器
代码如下:
import pysftp
dbfs_path = '/dbfs/mnt/aaa/bbb/output/{}/'.format(dbutils.widgets.get("run_name"))
remote_path = '/mst_bbb/{}/output/{}/'.format(bucket,dbutils.widgets.get("run_name"))
cnopts = pysftp.CnOpts()
cnopts.hostkeys = None
srv = pysftp.Connection(host=host_name, username="xxx",password="yyy",cnopts=cnopts)
srv.get_r(remote_path,dbfs_path)
它工作正常,直到我意识到有时我不得不多次获取相同的目录并且会抛出一个错误
the directory already exists
没问题,我想并做了以下事情:
import shutil
shutil.rmtree(dbfs_path)
然后重新运行代码
但现在我得到了一个完全不同的错误
---------------------------------------------------------------------------
IOError Traceback (most recent call last)
<ipython-input-16-9f782d79e03f> in <module>()
12
13 srv = pysftp.Connection(host=host_name, username="xxx",password="yyy",cnopts=cnopts)
---> 14 srv.get_r(remote_path,dbfs_path)
/databricks/python/local/lib/python2.7/site-packages/pysftp/__init__.pyc in get_r(self, remotedir, localdir, preserve_mtime)
309 self.get(fname,
310 reparent(localdir, fname),
--> 311 preserve_mtime=preserve_mtime)
312
313 def getfo(self, remotepath, flo, callback=None):
/databricks/python/local/lib/python2.7/site-packages/pysftp/__init__.pyc in get(self, remotepath, localpath, callback, preserve_mtime)
247 sftpattrs = self._sftp.stat(remotepath)
248
--> 249 self._sftp.get(remotepath, localpath, callback=callback)
250 if preserve_mtime:
251 os.utime(localpath, (sftpattrs.st_atime, sftpattrs.st_mtime))
/databricks/python/local/lib/python2.7/site-packages/paramiko/sftp_client.pyc in get(self, remotepath, localpath, callback)
767 Added the ``callback`` param
768 """
--> 769 with open(localpath, 'wb') as fl:
770 size = self.getfo(remotepath, fl, callback)
771 s = os.stat(localpath)
IOError: [Errno 2] No such file or directory: u'/dbfs/aaa/bbb/output/run_job/./mst_bbb/pri1/output/run_job/date=2017-12-01/2017-12-01_output_0.csv.gz'
知道是什么导致了这个问题吗?我想不通
谢谢
我相信 get_r
(localdir
参数)的目标目录必须存在。 pysftp 不会为您创建它。
虽然您调用 shutil.rmtree
不仅会删除目录内容,还会删除目录本身。
之后重新创建目录:
shutil.rmtree(dbfs_path)
os.mkdir(dbfs_path)
虽然其实我不明白你原来的问题。我不明白你为什么会收到 "the directory already exists" 错误。也许您应该询问这个问题,而不是实施低效的解决方法。
所以我有一个 "simple" 进程需要出去从另一个服务器抓取数据,然后将目录(和所有子目录)复制到我的服务器
代码如下:
import pysftp
dbfs_path = '/dbfs/mnt/aaa/bbb/output/{}/'.format(dbutils.widgets.get("run_name"))
remote_path = '/mst_bbb/{}/output/{}/'.format(bucket,dbutils.widgets.get("run_name"))
cnopts = pysftp.CnOpts()
cnopts.hostkeys = None
srv = pysftp.Connection(host=host_name, username="xxx",password="yyy",cnopts=cnopts)
srv.get_r(remote_path,dbfs_path)
它工作正常,直到我意识到有时我不得不多次获取相同的目录并且会抛出一个错误
the directory already exists
没问题,我想并做了以下事情:
import shutil
shutil.rmtree(dbfs_path)
然后重新运行代码
但现在我得到了一个完全不同的错误
---------------------------------------------------------------------------
IOError Traceback (most recent call last)
<ipython-input-16-9f782d79e03f> in <module>()
12
13 srv = pysftp.Connection(host=host_name, username="xxx",password="yyy",cnopts=cnopts)
---> 14 srv.get_r(remote_path,dbfs_path)
/databricks/python/local/lib/python2.7/site-packages/pysftp/__init__.pyc in get_r(self, remotedir, localdir, preserve_mtime)
309 self.get(fname,
310 reparent(localdir, fname),
--> 311 preserve_mtime=preserve_mtime)
312
313 def getfo(self, remotepath, flo, callback=None):
/databricks/python/local/lib/python2.7/site-packages/pysftp/__init__.pyc in get(self, remotepath, localpath, callback, preserve_mtime)
247 sftpattrs = self._sftp.stat(remotepath)
248
--> 249 self._sftp.get(remotepath, localpath, callback=callback)
250 if preserve_mtime:
251 os.utime(localpath, (sftpattrs.st_atime, sftpattrs.st_mtime))
/databricks/python/local/lib/python2.7/site-packages/paramiko/sftp_client.pyc in get(self, remotepath, localpath, callback)
767 Added the ``callback`` param
768 """
--> 769 with open(localpath, 'wb') as fl:
770 size = self.getfo(remotepath, fl, callback)
771 s = os.stat(localpath)
IOError: [Errno 2] No such file or directory: u'/dbfs/aaa/bbb/output/run_job/./mst_bbb/pri1/output/run_job/date=2017-12-01/2017-12-01_output_0.csv.gz'
知道是什么导致了这个问题吗?我想不通
谢谢
我相信 get_r
(localdir
参数)的目标目录必须存在。 pysftp 不会为您创建它。
虽然您调用 shutil.rmtree
不仅会删除目录内容,还会删除目录本身。
之后重新创建目录:
shutil.rmtree(dbfs_path)
os.mkdir(dbfs_path)
虽然其实我不明白你原来的问题。我不明白你为什么会收到 "the directory already exists" 错误。也许您应该询问这个问题,而不是实施低效的解决方法。