将一个数据帧转换为多个 df，并将它们作为 CSV 添加到 zip 存档（不在本地保存文件）

Question

我有一个从本地保存的 CSV 文件中读入的数据框。然后我想遍历所述文件并基于一列中的字符串创建多个 CSV 文件。

最后，我想将所有这些文件添加到一个 zip 文件中，但不将它们保存在本地。我只想要一个包含所有不同 CSV 文件的 zip 存档。

我使用 io 或 zipfile 模块的所有尝试只生成了一个包含一个 CSV 文件的 zip 文件（几乎与我从什么开始）

如有任何帮助，我们将不胜感激！到目前为止，这是我的代码，它可以工作，但会将所有 CSV 文件保存到我的硬盘上。

import pandas as pd
from zipfile import ZipFile

df = pd.read_csv("myCSV.csv")
channelsList = df["Turn one column to list"].values.tolist()
channelsList = list(set(channelsList)) #delete duplicates from list

for channel in channelsList:
    newDf = df.loc[df['Something to match'] == channel]
    
    newDf.to_csv(f"{channel}.csv") # saves csv files to disk

Answer 1

DataFrame.to_csv() 可以写入任何 file-like 对象，并且 ZipFile.writestr() 可以接受字符串（或字节），因此可以避免使用 io.StringIO。请参阅下面的示例代码。

注意：如果 channel 只是存储在输入数据的单个列中，那么迭代数据分区的更惯用（也更有效）的方法是使用 groupby().

from io import StringIO
from zipfile import ZipFile

import numpy as np
import pandas as pd

# Example data
df = pd.DataFrame(np.random.random((100,3)), columns=[*'xyz'])
df['channel'] = np.random.randint(5, size=len(df))

with ZipFile('/tmp/output.zip', 'w') as zf:
    for channel, channel_df in df.groupby('channel'):
        s = StringIO()
        channel_df.to_csv(s, index=False, header=True)
        zf.writestr(f"{channel}.csv", s.getvalue())

将一个数据帧转换为多个 df，并将它们作为 CSV 添加到 zip 存档（不在本地保存文件）

Turn one dataframe into several dfs and add them as CSVs to zip archive (without saving files locally)

python

csv

zip

for-loop

pandas