计算由 pandas 中的多列组成的组中的时间差
calculate time difference in group made from multiple columns in pandas
我有一个这样的数据框:
user datetime mode
-------------------------------------------------
1 2015-09-10 11:50:27 vehicle
1 2015-11-22 10:15:03 vehicle
1 2015-11-23 10:35:03 stop
2 2015-11-22 10:11:13 walk
2 2015-11-22 10:13:08 walk
2 2015-09-10 10:21:52 stop
我正在努力为每个用户计算一个月中每一天的旅行时间(模式)。
我的想法是提取月、日、小时,然后按用户、月、日、模式和小时分组,以此计算最大值和最小值之间的差异。
df.assign(output=main_table.groupby(['user','month_n','day_n','mode','hour_n']).datetime
.apply(lambda x: x - x.iloc[0]))
然而,当我尝试总结输出时
df.groupby(['user','month_n','day_n','mode','hour_n'])['output'].sum()
它似乎没有产生正确的结果。
我想要的输出是
user month day mode time_spent(sec)
-------------------------------------------------
1 10 5 vehicle 3600
1 10 5 walk 12345
1 10 5 stop 25879
1 10 6 walk 15
1 10 6 vehicle 98522
2 10 5 walk 1298522
2 10 11 vehicle 99622
3 10 6 vehicle 23247
非常感谢任何帮助!!谢谢。
UPDATED df
更好的例子
user datetime mode
-------------------------------------------------
1 10/09/2015 11:50:27 vehicle
1 10/09/2015 11:50:37 vehicle
1 10/09/2015 11:52:57 vehicle
1 10/09/2015 11:53:27 vehicle
1 10/09/2015 10:21:52 walk
1 10/09/2015 11:52:02 walk
1 10/09/2015 11:53:32 walk
1 10/09/2015 10:23:32 walk
1 10/09/2015 11:50:22 vehicle
1 10/09/2015 11:50:57 vehicle
2 22/11/2015 10:15:53 walk
2 22/11/2015 10:13:53 walk
2 22/11/2015 10:16:08 walk
2 22/11/2015 10:15:38 walk
2 22/11/2015 10:16:23 walk
2 22/11/2015 10:10:33 walk
2 22/11/2015 10:15:03 walk
2 22/11/2015 10:11:13 walk
2 22/11/2015 10:13:08 walk
2 22/11/2015 10:10:28 walk
添加一些上下文数据集包含许多用户、数周的日期时间和 10 种不同的模式,这些模式可以在一天中重复多次,每个模式都有 start/end 时间戳。
下面是我要走的路:
from io import StringIO
import pandas as pd
s = """user,datetime,mode
1, 10/09/2015 11:50:27, vehicle
1, 10/09/2015 11:50:37, vehicle
1, 10/09/2015 11:52:57, vehicle
1, 10/09/2015 11:53:27, vehicle
1, 10/09/2015 10:21:52, walk
1, 10/09/2015 11:52:02, walk
1, 10/09/2015 11:53:32, walk
1, 10/09/2015 10:23:32, walk
1, 10/09/2015 11:50:22, vehicle
1, 10/09/2015 11:50:57, vehicle
2, 22/11/2015 10:15:53 , walk
2, 22/11/2015 10:13:53 , walk
2, 22/11/2015 10:16:08 , walk
2, 22/11/2015 10:15:38 , walk
2, 22/11/2015 10:16:23 , walk
2, 22/11/2015 10:10:33 , walk
2, 22/11/2015 10:15:03 , walk
2, 22/11/2015 10:11:13 , walk
2, 22/11/2015 10:13:08 , walk
2, 22/11/2015 10:10:28 , walk"""
df = pd.read_csv(StringIO(s))
df.datetime = pd.to_datetime(df.datetime)
df.groupby(["user", "mode"]).datetime.max() - df.groupby(
["user", "mode"]
).datetime.min()
它生成所需的输出:
我有一个这样的数据框:
user datetime mode
-------------------------------------------------
1 2015-09-10 11:50:27 vehicle
1 2015-11-22 10:15:03 vehicle
1 2015-11-23 10:35:03 stop
2 2015-11-22 10:11:13 walk
2 2015-11-22 10:13:08 walk
2 2015-09-10 10:21:52 stop
我正在努力为每个用户计算一个月中每一天的旅行时间(模式)。
我的想法是提取月、日、小时,然后按用户、月、日、模式和小时分组,以此计算最大值和最小值之间的差异。
df.assign(output=main_table.groupby(['user','month_n','day_n','mode','hour_n']).datetime
.apply(lambda x: x - x.iloc[0]))
然而,当我尝试总结输出时
df.groupby(['user','month_n','day_n','mode','hour_n'])['output'].sum()
它似乎没有产生正确的结果。
我想要的输出是
user month day mode time_spent(sec)
-------------------------------------------------
1 10 5 vehicle 3600
1 10 5 walk 12345
1 10 5 stop 25879
1 10 6 walk 15
1 10 6 vehicle 98522
2 10 5 walk 1298522
2 10 11 vehicle 99622
3 10 6 vehicle 23247
非常感谢任何帮助!!谢谢。
UPDATED df
更好的例子user datetime mode
-------------------------------------------------
1 10/09/2015 11:50:27 vehicle
1 10/09/2015 11:50:37 vehicle
1 10/09/2015 11:52:57 vehicle
1 10/09/2015 11:53:27 vehicle
1 10/09/2015 10:21:52 walk
1 10/09/2015 11:52:02 walk
1 10/09/2015 11:53:32 walk
1 10/09/2015 10:23:32 walk
1 10/09/2015 11:50:22 vehicle
1 10/09/2015 11:50:57 vehicle
2 22/11/2015 10:15:53 walk
2 22/11/2015 10:13:53 walk
2 22/11/2015 10:16:08 walk
2 22/11/2015 10:15:38 walk
2 22/11/2015 10:16:23 walk
2 22/11/2015 10:10:33 walk
2 22/11/2015 10:15:03 walk
2 22/11/2015 10:11:13 walk
2 22/11/2015 10:13:08 walk
2 22/11/2015 10:10:28 walk
添加一些上下文数据集包含许多用户、数周的日期时间和 10 种不同的模式,这些模式可以在一天中重复多次,每个模式都有 start/end 时间戳。
下面是我要走的路:
from io import StringIO
import pandas as pd
s = """user,datetime,mode
1, 10/09/2015 11:50:27, vehicle
1, 10/09/2015 11:50:37, vehicle
1, 10/09/2015 11:52:57, vehicle
1, 10/09/2015 11:53:27, vehicle
1, 10/09/2015 10:21:52, walk
1, 10/09/2015 11:52:02, walk
1, 10/09/2015 11:53:32, walk
1, 10/09/2015 10:23:32, walk
1, 10/09/2015 11:50:22, vehicle
1, 10/09/2015 11:50:57, vehicle
2, 22/11/2015 10:15:53 , walk
2, 22/11/2015 10:13:53 , walk
2, 22/11/2015 10:16:08 , walk
2, 22/11/2015 10:15:38 , walk
2, 22/11/2015 10:16:23 , walk
2, 22/11/2015 10:10:33 , walk
2, 22/11/2015 10:15:03 , walk
2, 22/11/2015 10:11:13 , walk
2, 22/11/2015 10:13:08 , walk
2, 22/11/2015 10:10:28 , walk"""
df = pd.read_csv(StringIO(s))
df.datetime = pd.to_datetime(df.datetime)
df.groupby(["user", "mode"]).datetime.max() - df.groupby(
["user", "mode"]
).datetime.min()
它生成所需的输出: