如何在 Databricks 中使用 shutil 压缩文件(在 Azure Blob 存储上)
How to zip files (on Azure Blob Storage) with shutil in Databricks
我训练的深度学习模型存在于一个文件夹中的几个文件中。所以这与压缩数据帧无关。
我想压缩此文件夹(在 Azure Blob 存储中)。但是当我使用 shutil 时,这似乎不起作用:
import shutil
modelPath = "/dbfs/mnt/databricks/Models/predictBaseTerm/noNormalizationCode/2020-01-10-13-43/9_0.8147903598547376"
zipPath= "/mnt/databricks/Deploy/" (no /dbfs here or it will error)
shutil.make_archive(base_dir= modelPath, format='zip', base_name=zipPath)
有人知道如何执行此操作并将文件放到 Azure Blob 存储(我从中读取它的地方)吗?
最后我自己想通了。
无法使用 Shutil 直接写入 dbfs(Azure Blob 存储)。
你需要先像这样把文件放在databricks的本地驱动节点上(在文档的某处读到,你不能直接写入Blob存储):
import shutil
modelPath = "/dbfs/mnt/databricks/Models/predictBaseTerm/noNormalizationCode/2020-01-10-13-43/9_0.8147903598547376"
zipPath= "/tmp/model"
shutil.make_archive(base_dir= modelPath, format='zip', base_name=zipPath)
然后您可以将文件从本地驱动程序节点复制到 blob 存储。请注意 "file:" 从本地存储中获取文件!
blobStoragePath = "dbfs:/mnt/databricks/Models"
dbutils.fs.cp("file:" +zipPath + ".zip", blobStoragePath)
我为此浪费了几个小时,如果这个答案对您有帮助,请投票!
实际上,在不使用 shutil
的情况下,我可以将 Databricks dbfs
中的文件压缩为一个 zip 文件,作为已安装到 dbfs
的 Azure Blob 存储的 blob。
这是我使用 Python 标准库 os
和 zipfile
.
的示例代码
# Mount a container of Azure Blob Storage to dbfs
storage_account_name='<your storage account name>'
storage_account_access_key='<your storage account key>'
container_name = '<your container name>'
dbutils.fs.mount(
source = "wasbs://"+container_name+"@"+storage_account_name+".blob.core.windows.net",
mount_point = "/mnt/<a mount directory name under /mnt, such as `test`>",
extra_configs = {"fs.azure.account.key."+storage_account_name+".blob.core.windows.net":storage_account_access_key})
# List all files which need to be compressed
import os
modelPath = '/dbfs/mnt/databricks/Models/predictBaseTerm/noNormalizationCode/2020-01-10-13-43/9_0.8147903598547376'
filenames = [os.path.join(root, name) for root, dirs, files in os.walk(top=modelPath , topdown=False) for name in files]
# print(filenames)
# Directly zip files to Azure Blob Storage as a blob
# zipPath is the absoluted path of the compressed file on the mount point, such as `/dbfs/mnt/test/demo.zip`
zipPath = '/dbfs/mnt/<a mount directory name under /mnt, such as `test`>/demo.zip'
import zipfile
with zipfile.ZipFile(zipPath, 'w') as myzip:
for filename in filenames:
# print(filename)
myzip.write(filename)
我尝试将我的 test
容器挂载到 dbfs
和 运行 我的示例代码,然后我得到 demo.zip
文件,其中包含我的 test
容器,如下图
我训练的深度学习模型存在于一个文件夹中的几个文件中。所以这与压缩数据帧无关。
我想压缩此文件夹(在 Azure Blob 存储中)。但是当我使用 shutil 时,这似乎不起作用:
import shutil
modelPath = "/dbfs/mnt/databricks/Models/predictBaseTerm/noNormalizationCode/2020-01-10-13-43/9_0.8147903598547376"
zipPath= "/mnt/databricks/Deploy/" (no /dbfs here or it will error)
shutil.make_archive(base_dir= modelPath, format='zip', base_name=zipPath)
有人知道如何执行此操作并将文件放到 Azure Blob 存储(我从中读取它的地方)吗?
最后我自己想通了。
无法使用 Shutil 直接写入 dbfs(Azure Blob 存储)。
你需要先像这样把文件放在databricks的本地驱动节点上(在文档的某处读到,你不能直接写入Blob存储):
import shutil
modelPath = "/dbfs/mnt/databricks/Models/predictBaseTerm/noNormalizationCode/2020-01-10-13-43/9_0.8147903598547376"
zipPath= "/tmp/model"
shutil.make_archive(base_dir= modelPath, format='zip', base_name=zipPath)
然后您可以将文件从本地驱动程序节点复制到 blob 存储。请注意 "file:" 从本地存储中获取文件!
blobStoragePath = "dbfs:/mnt/databricks/Models"
dbutils.fs.cp("file:" +zipPath + ".zip", blobStoragePath)
我为此浪费了几个小时,如果这个答案对您有帮助,请投票!
实际上,在不使用 shutil
的情况下,我可以将 Databricks dbfs
中的文件压缩为一个 zip 文件,作为已安装到 dbfs
的 Azure Blob 存储的 blob。
这是我使用 Python 标准库 os
和 zipfile
.
# Mount a container of Azure Blob Storage to dbfs
storage_account_name='<your storage account name>'
storage_account_access_key='<your storage account key>'
container_name = '<your container name>'
dbutils.fs.mount(
source = "wasbs://"+container_name+"@"+storage_account_name+".blob.core.windows.net",
mount_point = "/mnt/<a mount directory name under /mnt, such as `test`>",
extra_configs = {"fs.azure.account.key."+storage_account_name+".blob.core.windows.net":storage_account_access_key})
# List all files which need to be compressed
import os
modelPath = '/dbfs/mnt/databricks/Models/predictBaseTerm/noNormalizationCode/2020-01-10-13-43/9_0.8147903598547376'
filenames = [os.path.join(root, name) for root, dirs, files in os.walk(top=modelPath , topdown=False) for name in files]
# print(filenames)
# Directly zip files to Azure Blob Storage as a blob
# zipPath is the absoluted path of the compressed file on the mount point, such as `/dbfs/mnt/test/demo.zip`
zipPath = '/dbfs/mnt/<a mount directory name under /mnt, such as `test`>/demo.zip'
import zipfile
with zipfile.ZipFile(zipPath, 'w') as myzip:
for filename in filenames:
# print(filename)
myzip.write(filename)
我尝试将我的 test
容器挂载到 dbfs
和 运行 我的示例代码,然后我得到 demo.zip
文件,其中包含我的 test
容器,如下图