Pandas : 获取每个月最后一个星期五的累计金额

Pandas : Getting a cumulative sum for each month on the last friday

我得到了一个如下所示的数据框:

date_order               date_despatch          date_validation        qty_ordered
2019-01-01 00:00:00     2019-11-01 00:00:00     2019-13-01 00:00:00    4.15
2019-01-01 00:00:00     2019-12-01 00:00:00     2019-14-01 00:00:00    5.9
2019-02-01 00:00:00     2019-16-01 00:00:00     2019-19-01 00:00:00    7.8
2019-03-01 00:00:00     2019-18-01 00:00:00     2019-20-01 00:00:00    9.6
2019-04-01 00:00:00     2019-22-01 00:00:00     2019-24-01 00:00:00    1.3
...
2019-03-02 00:00:00     2019-22-02 00:00:00     2019-25-02 00:00:00    1.2

我的目标是获取每个月从月初到该月最后一个星期五的累计订购数量(例如:2019-01-01 到 2019-25-01 2019 年 1 月)

预期结果:

date_order             cumulative_ordered
2019-01-01 00:00:00    10.05
2019-02-01 00:00:00    17.85
...                    ...
2019-24-01 00:00:00    150
2019-25-01 00:00:00    157

谁能帮我解决这个问题?

qty_ordered 始终为 1 的 df 为例(因此我们可以轻松跟踪结果):

import pandas as pd

df = pd.DataFrame({'date_order': pd.date_range('2019-01-01', '2019-03-01')})
df['qty_ordered'] = 1

print(df)
   date_order  qty_ordered
0  2019-01-01            1
1  2019-01-02            1
2  2019-01-03            1
3  2019-01-04            1
4  2019-01-05            1
5  2019-01-06            1
6  2019-01-07            1
7  2019-01-08            1
...
59 2019-03-01            1

2019 年 1 月的最后一个星期五是 2019-01-25,而 2 月是 2019-02-22。我们牢记这一点以验证累积量。

你可以这样做:

# Make sure dates are sorted.
df = df.sort_values('date_order')

# Flag the Fridays.
df['n_friday'] = df['date_order'].dt.dayofweek.eq(4)

# Column to groupby.
df['year_month'] = df['date_order'].dt.to_period("M")

# Remove days past the last Friday in each year/month group.
mask = df.groupby('year_month')['n_friday'].transform(lambda s: s.cumsum().shift().fillna(0).lt(4))
res_df = df[mask].drop(columns=['n_friday'])

# Calculate cumsum for each month.
res_df['cumulative_ordered'] = res_df.groupby('year_month')['qty_ordered'].cumsum()

print(res_df.drop(columns=['year_month']))
   date_order  qty_ordered  ordered_cumusm
0  2019-01-01            1               1
1  2019-01-02            1               2
2  2019-01-03            1               3
3  2019-01-04            1               4
4  2019-01-05            1               5
5  2019-01-06            1               6
6  2019-01-07            1               7
7  2019-01-08            1               8
...
52 2019-02-22            1              22
59 2019-03-01            1               1

检查 cumsum 和日期选择是否有效:

print(res_df.groupby('year_month').last())
           date_order  qty_ordered  cumulative_ordered
year_month                                            
2019/01    2019-01-25            1                  25
2019/02    2019-02-22            1                  22
2019/03    2019-03-01            1                   1