Pandas : 获取每个月最后一个星期五的累计金额
Pandas : Getting a cumulative sum for each month on the last friday
我得到了一个如下所示的数据框:
date_order date_despatch date_validation qty_ordered
2019-01-01 00:00:00 2019-11-01 00:00:00 2019-13-01 00:00:00 4.15
2019-01-01 00:00:00 2019-12-01 00:00:00 2019-14-01 00:00:00 5.9
2019-02-01 00:00:00 2019-16-01 00:00:00 2019-19-01 00:00:00 7.8
2019-03-01 00:00:00 2019-18-01 00:00:00 2019-20-01 00:00:00 9.6
2019-04-01 00:00:00 2019-22-01 00:00:00 2019-24-01 00:00:00 1.3
...
2019-03-02 00:00:00 2019-22-02 00:00:00 2019-25-02 00:00:00 1.2
我的目标是获取每个月从月初到该月最后一个星期五的累计订购数量(例如:2019-01-01 到 2019-25-01 2019 年 1 月)
预期结果:
date_order cumulative_ordered
2019-01-01 00:00:00 10.05
2019-02-01 00:00:00 17.85
... ...
2019-24-01 00:00:00 150
2019-25-01 00:00:00 157
谁能帮我解决这个问题?
以 qty_ordered
始终为 1 的 df 为例(因此我们可以轻松跟踪结果):
import pandas as pd
df = pd.DataFrame({'date_order': pd.date_range('2019-01-01', '2019-03-01')})
df['qty_ordered'] = 1
print(df)
date_order qty_ordered
0 2019-01-01 1
1 2019-01-02 1
2 2019-01-03 1
3 2019-01-04 1
4 2019-01-05 1
5 2019-01-06 1
6 2019-01-07 1
7 2019-01-08 1
...
59 2019-03-01 1
2019 年 1 月的最后一个星期五是 2019-01-25,而 2 月是 2019-02-22。我们牢记这一点以验证累积量。
你可以这样做:
# Make sure dates are sorted.
df = df.sort_values('date_order')
# Flag the Fridays.
df['n_friday'] = df['date_order'].dt.dayofweek.eq(4)
# Column to groupby.
df['year_month'] = df['date_order'].dt.to_period("M")
# Remove days past the last Friday in each year/month group.
mask = df.groupby('year_month')['n_friday'].transform(lambda s: s.cumsum().shift().fillna(0).lt(4))
res_df = df[mask].drop(columns=['n_friday'])
# Calculate cumsum for each month.
res_df['cumulative_ordered'] = res_df.groupby('year_month')['qty_ordered'].cumsum()
print(res_df.drop(columns=['year_month']))
date_order qty_ordered ordered_cumusm
0 2019-01-01 1 1
1 2019-01-02 1 2
2 2019-01-03 1 3
3 2019-01-04 1 4
4 2019-01-05 1 5
5 2019-01-06 1 6
6 2019-01-07 1 7
7 2019-01-08 1 8
...
52 2019-02-22 1 22
59 2019-03-01 1 1
检查 cumsum 和日期选择是否有效:
print(res_df.groupby('year_month').last())
date_order qty_ordered cumulative_ordered
year_month
2019/01 2019-01-25 1 25
2019/02 2019-02-22 1 22
2019/03 2019-03-01 1 1
我得到了一个如下所示的数据框:
date_order date_despatch date_validation qty_ordered
2019-01-01 00:00:00 2019-11-01 00:00:00 2019-13-01 00:00:00 4.15
2019-01-01 00:00:00 2019-12-01 00:00:00 2019-14-01 00:00:00 5.9
2019-02-01 00:00:00 2019-16-01 00:00:00 2019-19-01 00:00:00 7.8
2019-03-01 00:00:00 2019-18-01 00:00:00 2019-20-01 00:00:00 9.6
2019-04-01 00:00:00 2019-22-01 00:00:00 2019-24-01 00:00:00 1.3
...
2019-03-02 00:00:00 2019-22-02 00:00:00 2019-25-02 00:00:00 1.2
我的目标是获取每个月从月初到该月最后一个星期五的累计订购数量(例如:2019-01-01 到 2019-25-01 2019 年 1 月)
预期结果:
date_order cumulative_ordered
2019-01-01 00:00:00 10.05
2019-02-01 00:00:00 17.85
... ...
2019-24-01 00:00:00 150
2019-25-01 00:00:00 157
谁能帮我解决这个问题?
以 qty_ordered
始终为 1 的 df 为例(因此我们可以轻松跟踪结果):
import pandas as pd
df = pd.DataFrame({'date_order': pd.date_range('2019-01-01', '2019-03-01')})
df['qty_ordered'] = 1
print(df)
date_order qty_ordered
0 2019-01-01 1
1 2019-01-02 1
2 2019-01-03 1
3 2019-01-04 1
4 2019-01-05 1
5 2019-01-06 1
6 2019-01-07 1
7 2019-01-08 1
...
59 2019-03-01 1
2019 年 1 月的最后一个星期五是 2019-01-25,而 2 月是 2019-02-22。我们牢记这一点以验证累积量。
你可以这样做:
# Make sure dates are sorted.
df = df.sort_values('date_order')
# Flag the Fridays.
df['n_friday'] = df['date_order'].dt.dayofweek.eq(4)
# Column to groupby.
df['year_month'] = df['date_order'].dt.to_period("M")
# Remove days past the last Friday in each year/month group.
mask = df.groupby('year_month')['n_friday'].transform(lambda s: s.cumsum().shift().fillna(0).lt(4))
res_df = df[mask].drop(columns=['n_friday'])
# Calculate cumsum for each month.
res_df['cumulative_ordered'] = res_df.groupby('year_month')['qty_ordered'].cumsum()
print(res_df.drop(columns=['year_month']))
date_order qty_ordered ordered_cumusm
0 2019-01-01 1 1
1 2019-01-02 1 2
2 2019-01-03 1 3
3 2019-01-04 1 4
4 2019-01-05 1 5
5 2019-01-06 1 6
6 2019-01-07 1 7
7 2019-01-08 1 8
...
52 2019-02-22 1 22
59 2019-03-01 1 1
检查 cumsum 和日期选择是否有效:
print(res_df.groupby('year_month').last())
date_order qty_ordered cumulative_ordered
year_month
2019/01 2019-01-25 1 25
2019/02 2019-02-22 1 22
2019/03 2019-03-01 1 1