Pandas 在数据透视表中添加列总和 table(多索引)
Pandas adding sum of columns in pivot table (multiindexed)
我有 df 和 df_pivot,代码如下:
将 pandas 导入为 pd
将 numpy 导入为 np
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"Year": [2019, 2019, 2019, 2019,
2019, 2019, 2020, 2020,
2020],
"Month": ["01", "02", "03", "04", "05", "06", "01", "02", "03"],
"Values": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
df_pivot = pd.pivot_table(df, values='Values', index=['A', 'B'],
columns=['Year','Month'], aggfunc=np.sum, fill_value=0)
df_pivot 如下所示:
Year 2019 2020
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 8 0 0
two 0 0 0 0 0 0 0 9 9
foo one 2 4 5 0 0 0 0 0 0
two 0 0 0 5 6 0 0 0 0
现在我要做的是基本上将三列添加到此 df 中:
2019 财年、2019 年初至今、2020 年初至今
2019FY 列应为“2019”下所有值的总和
2019YTD 列应为“2019”下定义了期间的所有值的总和,即如果期间定义为 04,则 2019YTD 应为 2019 年 01/02/03/04 的列求和
2020YTD 列应为“2020”下所有值的总和,
输出 table 应如下所示:
Year 2019 2019FY 2019YTD 2020 2020YTD
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 6 0 8 0 0 8
two 0 0 0 0 0 0 0 0 0 9 9 18
foo one 2 4 5 0 0 0 11 11 0 0 0 0
two 0 0 0 5 6 0 11 5 0 0 0 0
基本上我想知道如何将列与给定的“月份”相加,因为从这里我可以自己创建 2019FY/2019YTD/2020YTD,将它们添加到数据透视表的特定插槽中也很重要table(2019年末数据和2020年末数据)
可行吗?
我到处都在寻找,但找不到如何做的例子。
感谢帮助
谢谢
帕维尔
每年都可以在自定义函数中创建新列,输出中的操作也是 GroupBy.apply
中的 2020FY
列:
def f(x):
#get all months and convert to integers numbers
c = x.columns.get_level_values(1).astype(int)
#sum all values
s1 = x.sum(axis=1)
#sum 1,2,3,4 months
s2 = x.loc[:, c <= 4].sum(axis=1)
x[(f'{x.name}FY','')] = s1
x[(f'{x.name}YTD','')] = s2
return x
df = df_pivot.groupby(level=0, axis=1, group_keys=False).apply(f)
print (df)
Year 2019 2019FY 2019YTD 2020 2020FY 2020YTD
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 6 0 8 0 0 8 8
two 0 0 0 0 0 0 0 0 0 9 9 18 18
foo one 2 4 5 0 0 0 11 11 0 0 0 0 0
two 0 0 0 5 6 0 11 5 0 0 0 0 0
如果需要删除列,请使用 tuple
s,因为 MultiIndex
:
df = df.drop([('2020FY','')], axis=1)
print (df)
Year 2019 2019FY 2019YTD 2020 2020YTD
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 6 0 8 0 0 8
two 0 0 0 0 0 0 0 0 0 9 9 18
foo one 2 4 5 0 0 0 11 11 0 0 0 0
two 0 0 0 5 6 0 11 5 0 0 0 0
您可以使用:
df.columns.get_level_values()
df.index.get_level_values()
用于切片多索引行和列的语法。我建议将 df 的月份列从字符串“01”更改为整数值,这样可以更轻松地使用 < > 运算符进行切片。
但是,如果您需要坚持使用字符串值月份列名称,则:
month_num = 4
df_pivot["2029YTD"] = df_pivot.loc[:, (df_pivot.columns.get_level_values(0) == 2019) &
(df_pivot.columns.get_level_values(1).astype(int) <= 4)].sum(axis=1)
df_pivot["2019FY"] = df_pivot.loc[:, df_pivot.columns.get_level_values(0) == 2019].sum(axis=1)
df_pivot["2020YTD"] = df_pivot.loc[:, df_pivot.columns.get_level_values(0) == 2020].sum(axis=1)
你最终会得到这样的结果:
Year 2019 2020 2019YTD 2019FY 2020YTD
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 8 0 0 0 6 8
two 0 0 0 0 0 0 0 9 9 0 0 18
foo one 2 4 5 0 0 0 0 0 0 11 11 0
two 0 0 0 5 6 0 0 0 0 5 11 0
完成后,您可以使用类似以下内容调整列位置:
df_pivot = df_pivot.loc[:, [2019, "2019FY", "2019YTD", 2020, "2020YTD"]]
得到类似的东西:
Year 2019 2019FY 2019YTD 2020 2020YTD
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 6 0 8 0 0 8
two 0 0 0 0 0 0 0 0 0 9 9 18
foo one 2 4 5 0 0 0 11 11 0 0 0 0
two 0 0 0 5 6 0 11 5 0 0 0 0
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"Year": [2019, 2019, 2019, 2019,
2019, 2019, 2020, 2020,
2020],
"Month": ["01", "02", "03", "04", "05", "06", "01", "02", "03"],
"Values": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
print(df)
df_pivot = pd.pivot_table(df, values='Values', index=['A', 'B'],
columns=['Year', 'Month'], aggfunc=np.sum, fill_value=0)
print(df_pivot)
# create the same pivot, but just using the year total
df_year_pivot = pd.pivot_table(df, values='Values', index=['A', 'B'],
columns=['Year'], aggfunc=np.sum, fill_value=0)
print(df_year_pivot)
# since the dataframe you wish to add will have 2 index levels
# you need to add another level when you join the resulting data
# and since your new level will be a YTD, I just appended it to the year
multi_index_tuples = [(x, f'{x}YTD') for x in df_year_pivot.columns]
# now, you are going to add the new index level to the df with the level names the same as your first pivot
df_year_pivot.columns = pd.MultiIndex.from_tuples(multi_index_tuples, names=['Year', 'Month'])
# happily join on the same index
total_df = pd.merge(df_pivot, df_year_pivot, how='left', left_index=True, right_index=True)
print(total_df)
# sort the column index
total_df = total_df.sort_index(axis=1, level=[0,1])
print(total_df)
我有 df 和 df_pivot,代码如下: 将 pandas 导入为 pd 将 numpy 导入为 np
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"Year": [2019, 2019, 2019, 2019,
2019, 2019, 2020, 2020,
2020],
"Month": ["01", "02", "03", "04", "05", "06", "01", "02", "03"],
"Values": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
df_pivot = pd.pivot_table(df, values='Values', index=['A', 'B'],
columns=['Year','Month'], aggfunc=np.sum, fill_value=0)
df_pivot 如下所示:
Year 2019 2020
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 8 0 0
two 0 0 0 0 0 0 0 9 9
foo one 2 4 5 0 0 0 0 0 0
two 0 0 0 5 6 0 0 0 0
现在我要做的是基本上将三列添加到此 df 中: 2019 财年、2019 年初至今、2020 年初至今
2019FY 列应为“2019”下所有值的总和
2019YTD 列应为“2019”下定义了期间的所有值的总和,即如果期间定义为 04,则 2019YTD 应为 2019 年 01/02/03/04 的列求和
2020YTD 列应为“2020”下所有值的总和,
输出 table 应如下所示:
Year 2019 2019FY 2019YTD 2020 2020YTD
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 6 0 8 0 0 8
two 0 0 0 0 0 0 0 0 0 9 9 18
foo one 2 4 5 0 0 0 11 11 0 0 0 0
two 0 0 0 5 6 0 11 5 0 0 0 0
基本上我想知道如何将列与给定的“月份”相加,因为从这里我可以自己创建 2019FY/2019YTD/2020YTD,将它们添加到数据透视表的特定插槽中也很重要table(2019年末数据和2020年末数据)
可行吗?
我到处都在寻找,但找不到如何做的例子。
感谢帮助
谢谢 帕维尔
每年都可以在自定义函数中创建新列,输出中的操作也是 GroupBy.apply
中的 2020FY
列:
def f(x):
#get all months and convert to integers numbers
c = x.columns.get_level_values(1).astype(int)
#sum all values
s1 = x.sum(axis=1)
#sum 1,2,3,4 months
s2 = x.loc[:, c <= 4].sum(axis=1)
x[(f'{x.name}FY','')] = s1
x[(f'{x.name}YTD','')] = s2
return x
df = df_pivot.groupby(level=0, axis=1, group_keys=False).apply(f)
print (df)
Year 2019 2019FY 2019YTD 2020 2020FY 2020YTD
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 6 0 8 0 0 8 8
two 0 0 0 0 0 0 0 0 0 9 9 18 18
foo one 2 4 5 0 0 0 11 11 0 0 0 0 0
two 0 0 0 5 6 0 11 5 0 0 0 0 0
如果需要删除列,请使用 tuple
s,因为 MultiIndex
:
df = df.drop([('2020FY','')], axis=1)
print (df)
Year 2019 2019FY 2019YTD 2020 2020YTD
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 6 0 8 0 0 8
two 0 0 0 0 0 0 0 0 0 9 9 18
foo one 2 4 5 0 0 0 11 11 0 0 0 0
two 0 0 0 5 6 0 11 5 0 0 0 0
您可以使用:
df.columns.get_level_values()
df.index.get_level_values()
用于切片多索引行和列的语法。我建议将 df 的月份列从字符串“01”更改为整数值,这样可以更轻松地使用 < > 运算符进行切片。 但是,如果您需要坚持使用字符串值月份列名称,则:
month_num = 4
df_pivot["2029YTD"] = df_pivot.loc[:, (df_pivot.columns.get_level_values(0) == 2019) &
(df_pivot.columns.get_level_values(1).astype(int) <= 4)].sum(axis=1)
df_pivot["2019FY"] = df_pivot.loc[:, df_pivot.columns.get_level_values(0) == 2019].sum(axis=1)
df_pivot["2020YTD"] = df_pivot.loc[:, df_pivot.columns.get_level_values(0) == 2020].sum(axis=1)
你最终会得到这样的结果:
Year 2019 2020 2019YTD 2019FY 2020YTD
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 8 0 0 0 6 8
two 0 0 0 0 0 0 0 9 9 0 0 18
foo one 2 4 5 0 0 0 0 0 0 11 11 0
two 0 0 0 5 6 0 0 0 0 5 11 0
完成后,您可以使用类似以下内容调整列位置:
df_pivot = df_pivot.loc[:, [2019, "2019FY", "2019YTD", 2020, "2020YTD"]]
得到类似的东西:
Year 2019 2019FY 2019YTD 2020 2020YTD
Month 01 02 03 04 05 06 01 02 03
A B
bar one 0 0 0 0 0 6 6 0 8 0 0 8
two 0 0 0 0 0 0 0 0 0 9 9 18
foo one 2 4 5 0 0 0 11 11 0 0 0 0
two 0 0 0 5 6 0 11 5 0 0 0 0
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
"bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two",
"one", "one", "two", "two"],
"Year": [2019, 2019, 2019, 2019,
2019, 2019, 2020, 2020,
2020],
"Month": ["01", "02", "03", "04", "05", "06", "01", "02", "03"],
"Values": [2, 4, 5, 5, 6, 6, 8, 9, 9]})
print(df)
df_pivot = pd.pivot_table(df, values='Values', index=['A', 'B'],
columns=['Year', 'Month'], aggfunc=np.sum, fill_value=0)
print(df_pivot)
# create the same pivot, but just using the year total
df_year_pivot = pd.pivot_table(df, values='Values', index=['A', 'B'],
columns=['Year'], aggfunc=np.sum, fill_value=0)
print(df_year_pivot)
# since the dataframe you wish to add will have 2 index levels
# you need to add another level when you join the resulting data
# and since your new level will be a YTD, I just appended it to the year
multi_index_tuples = [(x, f'{x}YTD') for x in df_year_pivot.columns]
# now, you are going to add the new index level to the df with the level names the same as your first pivot
df_year_pivot.columns = pd.MultiIndex.from_tuples(multi_index_tuples, names=['Year', 'Month'])
# happily join on the same index
total_df = pd.merge(df_pivot, df_year_pivot, how='left', left_index=True, right_index=True)
print(total_df)
# sort the column index
total_df = total_df.sort_index(axis=1, level=[0,1])
print(total_df)