通过将行交换为列并取 Pandas 中每列的总和，将单个 df 转换为多个 df

Question

我有以下 pandas 数据框：

Depts	Category	Monthly Booked	Monthly Delivered	Monthly Target	Yearly Booked	Yearly Delivered	Yearly Target
HR	Human	2345	2000	3000	1234556	234543	6432212
Software	Engg	654345	343213	765432	98765123	2345654	9999999
Security	Human	1234	1234	2000	23456	34568	234567
Software	Engg	12345	54334	324546	345645345	65345654	643563452
Software	Human	12345	54334	324546	345645345	65345654	643563452
Security	Engg	12345	54334	324546	34564534	65345654	643563452

现在我想将 Depts 的值转换为 headers 列并按 Category 分组，然后将每年和每月的总和与每个指标的总和一起放入两个数据表每列。如下所示：

每月数据

Category	Metric	Software	Security	HR
Engg	Target	1089978	324546
	Delivered	397547	12345
	Booked	666690	54334
Human	Target	324546	2000	3000
	Delivered	54334	1234	2000
	Booked	12345	1234	2345
Total	Target	1414524	326546	3000
	Delivered	451881	1234	2000
	Booked	679035	55568	2345

年度数据

Category	Metric	Software	Security	HR
Engg	Target	653563451	643563452
	Delivered	67691308	65345654
	Booked	44410468	34564534
Human	Target	643563452	234567	6432212
	Delivered	65345654	34568	234543
	Booked	345645345	23456	1234556
Total	Target	1297126903	643798019	6432212
	Delivered	133036962	65380222	234543
	Booked	390055813	34587990	1234556

我可以使用 pandas 函数来实现吗？如果是，那我该怎么做？注意：我还想保留分组，但将索引更改为列。意思是我想将索引名称更改为列名称，但将分组保留在前两列中。

我现在的代码——基于@Code 给出的答案，下面不同：

tmp = df.set_index(["Category", "Depts"])
tmp.columns = pd.MultiIndex.from_tuples([tuple(col.split(" ")) for col in tmp.columns], name=[None, "Metric"])
tmp = tmp.stack(level=1)

monthly = tmp.pivot_table(index=["Category", "Metric"], columns="Depts", values="Monthly", aggfunc="sum")

monthly = pd.concat([d.append(d.sum().rename(('Total', k))) for k, d in monthly.groupby(level=1)])
monthly = monthly.groupby(level=[0, 1], as_index=True).sum()
monthly.loc[:,'Total'] = monthly.sum(axis=1)

这保留了多级索引，但如果我使用 reset_index，那么如果我使用 to_html 或 to_excel 函数，分组将丢失。我想避免这种情况。

Answer 1

试试这个：

tmp = df.set_index(["Category", "Depts"])
tmp.columns = pd.MultiIndex.from_tuples([tuple(col.split(" ")) for col in tmp.columns], name=[None, "Metric"])
tmp = tmp.stack(level=1)

monthly = tmp.pivot_table(index=["Category", "Metric"], columns="Depts", values="Monthly", aggfunc="sum")
yearly  = tmp.pivot_table(index=["Category", "Metric"], columns="Depts", values="Yearly", aggfunc="sum")

Answer 2

灵感来自于其他答案的作品：

def pivot_and_stuff(dataframe, values):

    tmp = dataframe.set_index(["Category", "Depts"])
    tmp.columns = pd.MultiIndex.from_tuples([tuple(col.split(" ")) for col in tmp.columns], name=[None, "Metric"])
    tmp = tmp.stack(level=1)

    tmp = tmp.pivot_table(index=["Category", "Metric"], columns="Depts", values=values, aggfunc="sum")
    tmp2 = tmp.groupby(level=[1]).sum()
    tmp2['Category'] = 'Total'
    tmp2 = tmp2.set_index('Category', append=True).reorder_levels([1,0])

    dataframe = pd.concat([tmp, tmp2]).rename_axis('', axis=1).rename_axis(['Category', 'Metric'])
    dataframe = dataframe.reset_index().rename_axis('', axis=1)
    dataframe.Category = [i if not j else '' for i, j in zip(dataframe.Category.values, dataframe.Category.duplicated())]
    
    return dataframe

pd.set_option('display.float_format', '{:.0f}'.format)

df_m = pivot_and_stuff(df, 'Monthly')
df_y = pivot_and_stuff(df, 'Yearly')
print(df_m)
print()
print(df_y)

输出：

  Category     Metric   HR  Security  Software
0     Engg     Booked  NaN     12345    666690
1           Delivered  NaN     54334    397547
2              Target  NaN    324546   1089978
3    Human     Booked 2345      1234     12345
4           Delivered 2000      1234     54334
5              Target 3000      2000    324546
6    Total     Booked 2345     13579    679035
7           Delivered 2000     55568    451881
8              Target 3000    326546   1414524

  Category     Metric      HR  Security   Software
0     Engg     Booked     NaN  34564534  444410468
1           Delivered     NaN  65345654   67691308
2              Target     NaN 643563452  653563451
3    Human     Booked 1234556     23456  345645345
4           Delivered  234543     34568   65345654
5              Target 6432212    234567  643563452
6    Total     Booked 1234556  34587990  790055813
7           Delivered  234543  65380222  133036962
8              Target 6432212 643798019 1297126903

通过将行交换为列并取 Pandas 中每列的总和，将单个 df 转换为多个 df

Convert single df to multiple dfs by interchanging rows to columns and taking sum of each column in Pandas

python

dataframe

pandas

pandas-groupby