Pandas 在数据透视表中添加列总和 table(多索引)

Pandas adding sum of columns in pivot table (multiindexed)

我有 df 和 df_pivot,代码如下: 将 pandas 导入为 pd 将 numpy 导入为 np

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar"],
                  "B": ["one", "one", "one", "two", "two",
                         "one", "one", "two", "two"],
                  "Year": [2019, 2019, 2019, 2019,
                         2019, 2019, 2020, 2020,
                          2020],
                  "Month": ["01", "02", "03", "04", "05", "06", "01", "02", "03"],
                  "Values": [2, 4, 5, 5, 6, 6, 8, 9, 9]})


df_pivot = pd.pivot_table(df, values='Values', index=['A', 'B'],
                    columns=['Year','Month'], aggfunc=np.sum, fill_value=0)

df_pivot 如下所示:

Year    2019                2020      
Month     01 02 03 04 05 06   01 02 03
A   B                                 
bar one    0  0  0  0  0  6    8  0  0
    two    0  0  0  0  0  0    0  9  9
foo one    2  4  5  0  0  0    0  0  0
    two    0  0  0  5  6  0    0  0  0

现在我要做的是基本上将三列添加到此 df 中: 2019 财年、2019 年初至今、2020 年初至今

2019FY 列应为“2019”下所有值的总和

2019YTD 列应为“2019”下定义了期间的所有值的总和,即如果期间定义为 04,则 2019YTD 应为 2019 年 01/02/03/04 的列求和

2020YTD 列应为“2020”下所有值的总和,

输出 table 应如下所示:

Year    2019               2019FY 2019YTD 2020      2020YTD
Month     01 02 03 04 05 06                01 02 03
A   B                                 
bar one    0  0  0  0  0  6  6      0      8  0  0      8
    two    0  0  0  0  0  0  0      0      0  9  9      18
foo one    2  4  5  0  0  0 11      11     0  0  0      0
    two    0  0  0  5  6  0 11      5      0  0  0      0

基本上我想知道如何将列与给定的“月份”相加,因为从这里我可以自己创建 2019FY/2019YTD/2020YTD,将它们添加到数据透视表的特定插槽中也很重要table(2019年末数据和2020年末数据)

可行吗?

我到处都在寻找,但找不到如何做的例子。

感谢帮助

谢谢 帕维尔

每年都可以在自定义函数中创建新列,输出中的操作也是 GroupBy.apply 中的 2020FY 列:

def f(x):
    #get all months and convert to integers numbers
    c = x.columns.get_level_values(1).astype(int)
    #sum all values
    s1 = x.sum(axis=1)
    #sum 1,2,3,4 months
    s2 = x.loc[:, c <= 4].sum(axis=1)
    x[(f'{x.name}FY','')] = s1
    x[(f'{x.name}YTD','')] = s2

    return x

df = df_pivot.groupby(level=0, axis=1, group_keys=False).apply(f)
print (df)
Year    2019                2019FY 2019YTD 2020       2020FY 2020YTD
Month     01 02 03 04 05 06                  01 02 03               
A   B                                                               
bar one    0  0  0  0  0  6      6       0    8  0  0      8       8
    two    0  0  0  0  0  0      0       0    0  9  9     18      18
foo one    2  4  5  0  0  0     11      11    0  0  0      0       0
    two    0  0  0  5  6  0     11       5    0  0  0      0       0

如果需要删除列,请使用 tuples,因为 MultiIndex:

df = df.drop([('2020FY','')], axis=1)
print (df)
Year    2019                2019FY 2019YTD 2020       2020YTD
Month     01 02 03 04 05 06                  01 02 03        
A   B                                                        
bar one    0  0  0  0  0  6      6       0    8  0  0       8
    two    0  0  0  0  0  0      0       0    0  9  9      18
foo one    2  4  5  0  0  0     11      11    0  0  0       0
    two    0  0  0  5  6  0     11       5    0  0  0       0
    
    

您可以使用:

df.columns.get_level_values()
df.index.get_level_values()

用于切片多索引行和列的语法。我建议将 df 的月份列从字符串“01”更改为整数值,这样可以更轻松地使用 < > 运算符进行切片。 但是,如果您需要坚持使用字符串值月份列名称,则:

month_num = 4
df_pivot["2029YTD"] = df_pivot.loc[:, (df_pivot.columns.get_level_values(0) == 2019) & 
                                   (df_pivot.columns.get_level_values(1).astype(int) <= 4)].sum(axis=1)
df_pivot["2019FY"] = df_pivot.loc[:, df_pivot.columns.get_level_values(0) == 2019].sum(axis=1)
df_pivot["2020YTD"] = df_pivot.loc[:, df_pivot.columns.get_level_values(0) == 2020].sum(axis=1)

你最终会得到这样的结果:

    Year    2019    2020    2019YTD          2019FY 2020YTD
Month   01  02  03  04  05  06  01  02  03          
A   B                                               
bar one 0   0   0   0   0   6   8   0   0   0   6   8
    two 0   0   0   0   0   0   0   9   9   0   0   18
foo one 2   4   5   0   0   0   0   0   0   11  11  0
    two 0   0   0   5   6   0   0   0   0   5   11  0

完成后,您可以使用类似以下内容调整列位置:

df_pivot = df_pivot.loc[:, [2019, "2019FY", "2019YTD", 2020, "2020YTD"]]

得到类似的东西:

    Year    2019         2019FY  2019YTD 2020      2020YTD
Month   01  02  03  04  05  06          01  02  03  
A   B                                               
bar one 0   0   0   0   0   6   6   0   8   0   0   8
    two 0   0   0   0   0   0   0   0   0   9   9   18
foo one 2   4   5   0   0   0   11  11  0   0   0   0
    two 0   0   0   5   6   0   11  5   0   0   0   0
import pandas as pd
import numpy as np

df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar"],
                   "B": ["one", "one", "one", "two", "two",
                         "one", "one", "two", "two"],
                   "Year": [2019, 2019, 2019, 2019,
                            2019, 2019, 2020, 2020,
                            2020],
                   "Month": ["01", "02", "03", "04", "05", "06", "01", "02", "03"],
                   "Values": [2, 4, 5, 5, 6, 6, 8, 9, 9]})

print(df)

df_pivot = pd.pivot_table(df, values='Values', index=['A', 'B'],
                          columns=['Year', 'Month'], aggfunc=np.sum, fill_value=0)
print(df_pivot)

# create the same pivot, but just using the year total
df_year_pivot = pd.pivot_table(df, values='Values', index=['A', 'B'],
                               columns=['Year'], aggfunc=np.sum, fill_value=0)
print(df_year_pivot)
# since the dataframe you wish to add will have 2 index levels
# you need to add another level when you join the resulting data
# and since your new level will be a YTD, I just appended it to the year
multi_index_tuples = [(x, f'{x}YTD') for x in df_year_pivot.columns]

# now, you are going to add the new index level to the df with the level names the same as your first pivot
df_year_pivot.columns = pd.MultiIndex.from_tuples(multi_index_tuples, names=['Year', 'Month'])

# happily join on the same index
total_df = pd.merge(df_pivot, df_year_pivot, how='left', left_index=True, right_index=True)
print(total_df)

# sort the column index
total_df = total_df.sort_index(axis=1, level=[0,1])
print(total_df)