如何拆分数据框并将其存储在 excel 文件的多个工作表中

Question

我有一个如下所示的数据框

import numpy as np
import pandas as pd
from numpy.random import default_rng
rng = default_rng(100)
cdf = pd.DataFrame({'Id':[1,2,3,4,5],
                   'customer': rng.choice(list('ACD'),size=(5)),
                   'region': rng.choice(list('PQRS'),size=(5)),
                   'dumeel': rng.choice(list('QWER'),size=(5)),
                   'dumma': rng.choice((1234),size=(5)),
                   'target': rng.choice([0,1],size=(5))
})

我想执行以下操作

a) 为 region 和 customer 的唯一组合提取数据。含义 groupby.

b) 将它们存储在一个 excel 文件的每个 sheet 中（基于组数）

我正在尝试类似下面的方法，但应该有一些简洁的 pythonic 方法来做到这一点

df_list = []
grouped = cdf.groupby(['customer','region'])
for k,v in grouped:
    for i in range(len(k)):
        df = cdf[(cdf['customer']==k[i] & cdf['region']==k[i+1])]
        df_list.append(df)

我希望我的输出如下所示（显示在多个屏幕截图中）。

由于我的真实数据有 200 列和 100 万行，任何高效和优雅的方法都会很有帮助

Answer 1

在循环中使用this solution：

writer = pd.ExcelWriter('out.xlsx', engine='xlsxwriter')
    
for (cust, reg), v in cdf.groupby(['customer','region']):
    v.to_excel(writer, sheet_name=f"DATA_{cust}_{reg}")
        
    # Close the Pandas Excel writer and output the Excel file.
writer.save()

如何拆分数据框并将其存储在 excel 文件的多个工作表中

How to split the dataframe and store it in multiple sheets of a excel file

python

excel

dataframe

pandas

pandas-groupby