通过将行交换为列并取 Pandas 中每列的总和,将单个 df 转换为多个 df

Convert single df to multiple dfs by interchanging rows to columns and taking sum of each column in Pandas

我有以下 pandas 数据框:

Depts Category Monthly Booked Monthly Delivered Monthly Target Yearly Booked Yearly Delivered Yearly Target
HR Human 2345 2000 3000 1234556 234543 6432212
Software Engg 654345 343213 765432 98765123 2345654 9999999
Security Human 1234 1234 2000 23456 34568 234567
Software Engg 12345 54334 324546 345645345 65345654 643563452
Software Human 12345 54334 324546 345645345 65345654 643563452
Security Engg 12345 54334 324546 34564534 65345654 643563452

现在我想将 Depts 的值转换为 headers 列并按 Category 分组,然后将每年和每月的总和与每个指标的总和一起放入两个数据表每列。 如下所示:

每月数据

Category Metric Software Security HR
Engg Target 1089978 324546
Delivered 397547 12345
Booked 666690 54334
Human Target 324546 2000 3000
Delivered 54334 1234 2000
Booked 12345 1234 2345
Total Target 1414524 326546 3000
Delivered 451881 1234 2000
Booked 679035 55568 2345

年度数据

Category Metric Software Security HR
Engg Target 653563451 643563452
Delivered 67691308 65345654
Booked 44410468 34564534
Human Target 643563452 234567 6432212
Delivered 65345654 34568 234543
Booked 345645345 23456 1234556
Total Target 1297126903 643798019 6432212
Delivered 133036962 65380222 234543
Booked 390055813 34587990 1234556

我可以使用 pandas 函数来实现吗?如果是,那我该怎么做? 注意:我还想保留分组,但将索引更改为列。意思是我想将索引名称更改为列名称,但将分组保留在前两列中。

我现在的代码——基于@Code 给出的答案,下面不同:

tmp = df.set_index(["Category", "Depts"])
tmp.columns = pd.MultiIndex.from_tuples([tuple(col.split(" ")) for col in tmp.columns], name=[None, "Metric"])
tmp = tmp.stack(level=1)

monthly = tmp.pivot_table(index=["Category", "Metric"], columns="Depts", values="Monthly", aggfunc="sum")

monthly = pd.concat([d.append(d.sum().rename(('Total', k))) for k, d in monthly.groupby(level=1)])
monthly = monthly.groupby(level=[0, 1], as_index=True).sum()
monthly.loc[:,'Total'] = monthly.sum(axis=1)

这保留了多级索引,但如果我使用 reset_index,那么如果我使用 to_htmlto_excel 函数,分组将丢失。我想避免这种情况。

试试这个:

tmp = df.set_index(["Category", "Depts"])
tmp.columns = pd.MultiIndex.from_tuples([tuple(col.split(" ")) for col in tmp.columns], name=[None, "Metric"])
tmp = tmp.stack(level=1)

monthly = tmp.pivot_table(index=["Category", "Metric"], columns="Depts", values="Monthly", aggfunc="sum")
yearly  = tmp.pivot_table(index=["Category", "Metric"], columns="Depts", values="Yearly", aggfunc="sum")

灵感来自于其他答案的作品:

def pivot_and_stuff(dataframe, values):

    tmp = dataframe.set_index(["Category", "Depts"])
    tmp.columns = pd.MultiIndex.from_tuples([tuple(col.split(" ")) for col in tmp.columns], name=[None, "Metric"])
    tmp = tmp.stack(level=1)

    tmp = tmp.pivot_table(index=["Category", "Metric"], columns="Depts", values=values, aggfunc="sum")
    tmp2 = tmp.groupby(level=[1]).sum()
    tmp2['Category'] = 'Total'
    tmp2 = tmp2.set_index('Category', append=True).reorder_levels([1,0])

    dataframe = pd.concat([tmp, tmp2]).rename_axis('', axis=1).rename_axis(['Category', 'Metric'])
    dataframe = dataframe.reset_index().rename_axis('', axis=1)
    dataframe.Category = [i if not j else '' for i, j in zip(dataframe.Category.values, dataframe.Category.duplicated())]
    
    return dataframe

pd.set_option('display.float_format', '{:.0f}'.format)

df_m = pivot_and_stuff(df, 'Monthly')
df_y = pivot_and_stuff(df, 'Yearly')
print(df_m)
print()
print(df_y)

输出:

  Category     Metric   HR  Security  Software
0     Engg     Booked  NaN     12345    666690
1           Delivered  NaN     54334    397547
2              Target  NaN    324546   1089978
3    Human     Booked 2345      1234     12345
4           Delivered 2000      1234     54334
5              Target 3000      2000    324546
6    Total     Booked 2345     13579    679035
7           Delivered 2000     55568    451881
8              Target 3000    326546   1414524

  Category     Metric      HR  Security   Software
0     Engg     Booked     NaN  34564534  444410468
1           Delivered     NaN  65345654   67691308
2              Target     NaN 643563452  653563451
3    Human     Booked 1234556     23456  345645345
4           Delivered  234543     34568   65345654
5              Target 6432212    234567  643563452
6    Total     Booked 1234556  34587990  790055813
7           Delivered  234543  65380222  133036962
8              Target 6432212 643798019 1297126903