Python - Pandas,按时间间隔分组
Python - Pandas, group by time intervals
拥有以下 DF:
group_id timestamp
A 2020-09-29 06:00:00 UTC
A 2020-09-29 08:00:00 UTC
A 2020-09-30 09:00:00 UTC
B 2020-09-01 04:00:00 UTC
B 2020-09-01 06:00:00 UTC
我想计算使用所有组的记录之间的差异,而不是计算组之间的差异。上述示例的结果:
delta count
2 2
25 1
解释:在 A 组中,增量为
06:00:00 -> 08:00:00 (2 hours)
08:00:00 -> 09:00:00 on the next day (25 hours)
B组:
04:00:00 -> 06:00:00 (2 hours)
如何使用 Python Pandas 实现此目的?
代码
df_out = df.groupby("group_id").diff().groupby("timestamp").size()
# convert to dataframe
df_out = df_out.to_frame().reset_index().rename(columns={"timestamp": "delta", 0: "count"})
结果
print(df_out)
delta count
0 0 days 02:00:00 2
1 1 days 01:00:00 1
由 groupby-diff 生成的 NaT
(缺失值)被自动忽略。
要以小时表示时间增量,只需调用 total_seconds()
方法。
df_out["delta"] = df_out["delta"].dt.total_seconds() / 3600
print(df_out)
delta count
0 2.0 2
1 25.0 1
使用 DataFrameGroupBy.diff
for differencies per groups, convert to seconds by Series.dt.total_seconds
, divide by 3600
for hours and last count values by Series.value_counts
将 Series
转换为 2 columns DataFrame
:
df1 = (df.groupby("group_id")['timestamp']
.diff()
.dt.total_seconds()
.div(3600)
.value_counts()
.rename_axis('delta')
.reset_index(name='count'))
print (df1)
delta count
0 2.0 2
1 25.0 1
拥有以下 DF:
group_id timestamp
A 2020-09-29 06:00:00 UTC
A 2020-09-29 08:00:00 UTC
A 2020-09-30 09:00:00 UTC
B 2020-09-01 04:00:00 UTC
B 2020-09-01 06:00:00 UTC
我想计算使用所有组的记录之间的差异,而不是计算组之间的差异。上述示例的结果:
delta count
2 2
25 1
解释:在 A 组中,增量为
06:00:00 -> 08:00:00 (2 hours)
08:00:00 -> 09:00:00 on the next day (25 hours)
B组:
04:00:00 -> 06:00:00 (2 hours)
如何使用 Python Pandas 实现此目的?
代码
df_out = df.groupby("group_id").diff().groupby("timestamp").size()
# convert to dataframe
df_out = df_out.to_frame().reset_index().rename(columns={"timestamp": "delta", 0: "count"})
结果
print(df_out)
delta count
0 0 days 02:00:00 2
1 1 days 01:00:00 1
由 groupby-diff 生成的 NaT
(缺失值)被自动忽略。
要以小时表示时间增量,只需调用 total_seconds()
方法。
df_out["delta"] = df_out["delta"].dt.total_seconds() / 3600
print(df_out)
delta count
0 2.0 2
1 25.0 1
使用 DataFrameGroupBy.diff
for differencies per groups, convert to seconds by Series.dt.total_seconds
, divide by 3600
for hours and last count values by Series.value_counts
将 Series
转换为 2 columns DataFrame
:
df1 = (df.groupby("group_id")['timestamp']
.diff()
.dt.total_seconds()
.div(3600)
.value_counts()
.rename_axis('delta')
.reset_index(name='count'))
print (df1)
delta count
0 2.0 2
1 25.0 1