将 pandas df 转换为 parquet-file-bytes-object

Question

我有一个 pandas 数据框，想将其作为 parquet 文件写入 Azure 文件存储。

到目前为止，我无法将数据帧直接转换为字节，然后我可以将其上传到 Azure。我目前的解决方法是将其作为镶木地板文件保存到本地驱动器，然后将其作为字节对象读取，我可以将其上传到 Azure。

谁能告诉我如何将 pandas 数据帧直接转换为 "parquet file" 字节对象而不将其写入磁盘？ I/O 操作真的很慢，感觉很像丑陋的代码...

# Transform the data_frame into a parquet file on the local drive    
data_frame.to_parquet('temp_p.parquet', engine='auto', compression='snappy')

# Read the parquet file as bytes.
with open("temp_p.parquet", mode='rb') as f:
     fileContent = f.read()

     # Upload the bytes object to Azure
     service.create_file_from_bytes(share_name, file_path, file_name, fileContent, index=0, count=len(fileContent))

我正在寻找实现类似这样的东西，其中 transform_functionality returns 一个字节对象：

my_bytes = data_frame.transform_functionality()
service.create_file_from_bytes(share_name, file_path, file_name, my_bytes, index=0, count=len(my_bytes))

Answer 1

我找到了一个解决方案，我会 post 在这里，以防有人需要执行相同的任务。在将 to_parquet 文件写入缓冲区后，我使用 _.getvalue() 功能从缓冲区中取出字节对象，如下所示：

buffer = BytesIO()
data_frame.to_parquet(buffer, engine='auto', compression='snappy')

service.create_file_from_bytes(share_name, file_path, file_name, buffer.getvalue(), index=0, count=buffer.getbuffer().nbytes )

将 pandas df 转换为 parquet-file-bytes-object

Transforming a pandas df to a parquet-file-bytes-object

python

azure

pandas

pyarrow