最近 n 天的滚动计数
Rolling count over the last n days
我有以下数据框:
entry_time_flat route_id time_slot
2019-09-02 00:00:00 1_2 0-6
2019-09-04 00:00:00 3_4 6-12
2019-09-06 00:00:00 1_2 0-6
2019-09-06 00:00:00 1_2 18-20
...
我想创建一个 final_df,对于每个 route_id 和 time_slot,计算最后 n_days 的出现次数(n_days = 30).
为了说明,我想获得以下 df:
print(final_df)
entry_time_flat route_id time_slot n_occurrences
2019-09-02 00:00:00 1 0-6 0
2019-09-04 00:00:00 3 6-12 0
2019-09-06 00:00:00 1 0-6 1
2019-09-06 00:00:00 1 18-20 0
...
我怎样才能有效地达到那个结果?
您可以使用 pd.DataFrame.rolling
和偏移量:
# set date column as index, make sure it is sorted
df.set_index('entry_time_flat',inplace=True)
df.sort_index(inplace=True)
# define offset
n_days = 30
offset = str(n_days)+'D'
# count
final_df = df.groupby(['route_id','time_slot'])['route_id'].rolling(offset,closed='left').count()
final_df.fillna(0,inplace=True)
# get desired output format
final_df.name = 'n_occurrences'
final_df = final_df.reset_index()
编辑:您似乎希望间隔左闭。相应地更改了答案。
我有以下数据框:
entry_time_flat route_id time_slot
2019-09-02 00:00:00 1_2 0-6
2019-09-04 00:00:00 3_4 6-12
2019-09-06 00:00:00 1_2 0-6
2019-09-06 00:00:00 1_2 18-20
...
我想创建一个 final_df,对于每个 route_id 和 time_slot,计算最后 n_days 的出现次数(n_days = 30).
为了说明,我想获得以下 df:
print(final_df)
entry_time_flat route_id time_slot n_occurrences
2019-09-02 00:00:00 1 0-6 0
2019-09-04 00:00:00 3 6-12 0
2019-09-06 00:00:00 1 0-6 1
2019-09-06 00:00:00 1 18-20 0
...
我怎样才能有效地达到那个结果?
您可以使用 pd.DataFrame.rolling
和偏移量:
# set date column as index, make sure it is sorted
df.set_index('entry_time_flat',inplace=True)
df.sort_index(inplace=True)
# define offset
n_days = 30
offset = str(n_days)+'D'
# count
final_df = df.groupby(['route_id','time_slot'])['route_id'].rolling(offset,closed='left').count()
final_df.fillna(0,inplace=True)
# get desired output format
final_df.name = 'n_occurrences'
final_df = final_df.reset_index()
编辑:您似乎希望间隔左闭。相应地更改了答案。