Pandas 在两个两列内重新采样数据

Question

我有一个带有日期时间的数据框，我想将其重新采样到 20 分钟并计算所有组 'name' 列 'trip' 中给出的唯一值的数量。

    name            Date         trip
0     4 2019-08-22 00:44:51      1
1     4 2019-08-22 00:45:40      1
2     4 2019-08-22 01:45:52      2
3     4 2019-08-22 01:44:51      2
4     4 2019-08-22 01:45:40      2
5     5 2019-08-22 01:45:52      3
6     5 2019-08-22 01:45:59      3

所需的输出如下所示：

Date                    Trip count
2019-08-22 00:40:00     1   
2019-08-22 01:00:00     0
2019-08-22 01:20:00     0
2019-08-22 01:40:00     2
2019-08-22 02:00:00     0

因此行程为 1，因为在 00:40:00 和 01:00 之间只有 1 趟行程（来自 name=4）。行程为 2，因为在 01:40:00 和 02:00 之间有 2 次行程（来自 name=4 和 name=5）。行程计数为 0 else

Answer 1

试试这个：

DataFrame.groupby('Date').resample('20T').trip.nunique()

参考：

Answer 2

你想要 set_index 与 DataFrame.resample 和 nunique of trip:

# df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
dfn = df.set_index('Date').resample('20T')['trip'].nunique().reset_index(name='Trip count')

                 Date  Trip count
0 2019-08-22 00:40:00           1
1 2019-08-22 01:00:00           0
2 2019-08-22 01:20:00           0
3 2019-08-22 01:40:00           2

Pandas 在两个两列内重新采样数据

Pandas resampling data within two two columns

count

resampling

pandas