获取每组的滚动总和
Getting Rolling Sum per Group
我有这样一个数据框:
Product_ID Quantity Year Quarter
1 100 2021 1
1 100 2021 2
1 50 2021 3
1 100 2021 4
1 100 2022 1
2 100 2021 1
2 100 2021 2
3 100 2021 1
3 100 2021 2
我想根据 Product_ID.
获取过去三个月(不包括当前月份)的总和
因此我尝试了这个:
df['Qty_Sum_3qrts'] = (df.groupby('Product_ID'['Quantity'].shift(1,fill_value=0)
.rolling(3).sum().reset_index(0,drop=True)
)
# Shifting 1, because I want to exclude the current row.
# Rolling 3, because I want to have the 3 'rows' before
# Grouping by, because I want to have the calculation PER product
我的代码失败了,因为它不仅计算每个产品,而且还会给我其他产品的数字(假设产品 2,第 1 季度:给我产品 1 的 3 行)。
我建议的结果:
Product_ID Quantity Year Quarter Qty_Sum_3qrts
1 100 2021 1 0 # because we dont historical data for this id
1 100 2021 2 100 # sum of last month of this product
1 50 2021 3 200 # sum of last 2 months of this product
1 100 2021 4 250 # sum of last 3 months of this product
1 100 2022 1 250 # sum of last 3 months of this product
2 100 2021 1 0 # because we dont have hist data for this id
2 100 2021 2 100 # sum of last month of this product
3 100 2021 1 0 # etc
3 100 2021 2 100 # etc
您需要对每组应用滚动总和,为此您可以使用apply
:
df['Qty_Sum_3qrts'] = (df.groupby('Product_ID')['Quantity']
.apply(lambda s: s.shift(1,fill_value=0)
.rolling(3, min_periods=1).sum())
)
输出:
Product_ID Quantity Year Quarter Qty_Sum_3qrts
0 1 100 2021 1 0.0
1 1 100 2021 2 100.0
2 1 50 2021 3 200.0
3 1 100 2021 4 250.0
4 1 100 2022 1 250.0
5 2 100 2021 1 0.0
6 2 100 2021 2 100.0
7 3 100 2021 1 0.0
8 3 100 2021 2 100.0
我有这样一个数据框:
Product_ID Quantity Year Quarter
1 100 2021 1
1 100 2021 2
1 50 2021 3
1 100 2021 4
1 100 2022 1
2 100 2021 1
2 100 2021 2
3 100 2021 1
3 100 2021 2
我想根据 Product_ID.
获取过去三个月(不包括当前月份)的总和因此我尝试了这个:
df['Qty_Sum_3qrts'] = (df.groupby('Product_ID'['Quantity'].shift(1,fill_value=0)
.rolling(3).sum().reset_index(0,drop=True)
)
# Shifting 1, because I want to exclude the current row.
# Rolling 3, because I want to have the 3 'rows' before
# Grouping by, because I want to have the calculation PER product
我的代码失败了,因为它不仅计算每个产品,而且还会给我其他产品的数字(假设产品 2,第 1 季度:给我产品 1 的 3 行)。
我建议的结果:
Product_ID Quantity Year Quarter Qty_Sum_3qrts
1 100 2021 1 0 # because we dont historical data for this id
1 100 2021 2 100 # sum of last month of this product
1 50 2021 3 200 # sum of last 2 months of this product
1 100 2021 4 250 # sum of last 3 months of this product
1 100 2022 1 250 # sum of last 3 months of this product
2 100 2021 1 0 # because we dont have hist data for this id
2 100 2021 2 100 # sum of last month of this product
3 100 2021 1 0 # etc
3 100 2021 2 100 # etc
您需要对每组应用滚动总和,为此您可以使用apply
:
df['Qty_Sum_3qrts'] = (df.groupby('Product_ID')['Quantity']
.apply(lambda s: s.shift(1,fill_value=0)
.rolling(3, min_periods=1).sum())
)
输出:
Product_ID Quantity Year Quarter Qty_Sum_3qrts
0 1 100 2021 1 0.0
1 1 100 2021 2 100.0
2 1 50 2021 3 200.0
3 1 100 2021 4 250.0
4 1 100 2022 1 250.0
5 2 100 2021 1 0.0
6 2 100 2021 2 100.0
7 3 100 2021 1 0.0
8 3 100 2021 2 100.0