有什么方法可以在不写入本地磁盘的情况下将 xlsxwriter 生成的文件发送到 azure 数据湖?

Any way to send an xlsxwriter generated file to azure data lake without writing to local disk?

出于安全考虑,我需要将文件移动到 Azure Datalake 存储而不在本地写入文件。这是一个使用 xlsxwriter 包创建的 excel 工作簿。这是我尝试过的 returns a ValueError: Seek only available in read mode

import pandas as pd
from azure.datalake.store import core, lib, multithread
import xlsxwriter as xl

# Dataframes have undergone manipulation not listed in this code and come from a DB connection
matrix = pd.DataFrame(Database_Query1)
raw = pd.DataFrame(Database_Query2)

# Name datalake path for workbook
dlpath = '/datalake/file/path/file_name.xlsx'

# List store name
store_name = 'store_name_here'

# Create auth token
token = lib.auth(tenant_id= 'tenant_id_here',
                 client_id= 'client_id_here',
                 client_secret= 'client_secret_here')

# Create management file system client object
adl = core.AzureDLFileSystem(token, store_name= store_name)

# Create workbook structure
writer = pd.ExcelWriter(adl.open(dlpath, 'wb'), engine= 'xlsxwriter')
matrix.to_excel(writer, sheet_name= 'Compliance')
raw.to_excel(writer, sheet_name= 'Raw Data')

writer.save()

有什么想法吗?提前致谢。

如果数据不是非常大,您可以考虑将字节保留在内存中并将流转储回您的 adl:

from io import BytesIO

xlb = BytesIO()
# ... do what you need to do ... #

writer = pd.ExcelWriter(xlb, engine= 'xlsxwriter')
matrix.to_excel(writer, sheet_name= 'Compliance')
raw.to_excel(writer, sheet_name= 'Raw Data')
writer.save()

# Set the cursor of the stream back to the beginning
xlb.seek(0) 

with adl.open(dlpath, 'wb') as fl:
     # This part I'm not entirely sure - consult what your adl write methods are
     fl.write(xlb.read())