计算由 pandas 中的多列组成的组中的时间差

calculate time difference in group made from multiple columns in pandas

我有一个这样的数据框:

  user       datetime              mode
    -------------------------------------------------
    1        2015-09-10 11:50:27        vehicle
    1        2015-11-22 10:15:03        vehicle
    1        2015-11-23 10:35:03        stop
    2        2015-11-22 10:11:13         walk
    2        2015-11-22 10:13:08         walk
    2        2015-09-10 10:21:52         stop

我正在努力为每个用户计算一个月中每一天的旅行时间(模式)。

我的想法是提取月、日、小时,然后按用户、月、日、模式和小时分组,以此计算最大值和最小值之间的差异。

df.assign(output=main_table.groupby(['user','month_n','day_n','mode','hour_n']).datetime
                  .apply(lambda x: x - x.iloc[0]))

然而,当我尝试总结输出时

df.groupby(['user','month_n','day_n','mode','hour_n'])['output'].sum()

它似乎没有产生正确的结果。

我想要的输出是

  user     month        day        mode        time_spent(sec)
    -------------------------------------------------
    1      10         5         vehicle        3600
    1      10         5         walk           12345
    1      10         5         stop           25879
    1      10         6         walk           15
    1      10         6         vehicle        98522
    2      10         5         walk           1298522
    2      10         11        vehicle        99622
    3      10         6         vehicle        23247

非常感谢任何帮助!!谢谢。

UPDATED df

更好的例子
user    datetime            mode 
 -------------------------------------------------
1   10/09/2015  11:50:27    vehicle
1   10/09/2015  11:50:37    vehicle
1   10/09/2015  11:52:57    vehicle
1   10/09/2015  11:53:27    vehicle
1   10/09/2015  10:21:52    walk
1   10/09/2015  11:52:02    walk
1   10/09/2015  11:53:32    walk
1   10/09/2015  10:23:32    walk
1   10/09/2015  11:50:22    vehicle
1   10/09/2015  11:50:57    vehicle
2   22/11/2015 10:15:53     walk
2   22/11/2015 10:13:53     walk
2   22/11/2015 10:16:08     walk
2   22/11/2015 10:15:38     walk
2   22/11/2015 10:16:23     walk
2   22/11/2015 10:10:33     walk
2   22/11/2015 10:15:03     walk
2   22/11/2015 10:11:13     walk
2   22/11/2015 10:13:08     walk
2   22/11/2015 10:10:28     walk

添加一些上下文数据集包含许多用户、数周的日期时间和 10 种不同的模式,这些模式可以在一天中重复多次,每个模式都有 start/end 时间戳。

下面是我要走的路:

from io import StringIO
import pandas as pd

s = """user,datetime,mode
1,  10/09/2015  11:50:27,    vehicle
1,  10/09/2015  11:50:37,    vehicle
1,  10/09/2015  11:52:57,    vehicle
1,  10/09/2015  11:53:27,    vehicle
1,  10/09/2015  10:21:52,    walk
1,  10/09/2015  11:52:02,    walk
1,  10/09/2015  11:53:32,    walk
1,  10/09/2015  10:23:32,    walk
1,  10/09/2015  11:50:22,    vehicle
1,  10/09/2015  11:50:57,    vehicle
2,  22/11/2015 10:15:53 ,    walk
2,  22/11/2015 10:13:53 ,    walk
2,  22/11/2015 10:16:08 ,    walk
2,  22/11/2015 10:15:38 ,    walk
2,  22/11/2015 10:16:23 ,    walk
2,  22/11/2015 10:10:33 ,    walk
2,  22/11/2015 10:15:03 ,    walk
2,  22/11/2015 10:11:13 ,    walk
2,  22/11/2015 10:13:08 ,    walk
2,  22/11/2015 10:10:28 ,    walk"""

df = pd.read_csv(StringIO(s))

df.datetime = pd.to_datetime(df.datetime)

df.groupby(["user", "mode"]).datetime.max() - df.groupby(
    ["user", "mode"]
).datetime.min()

它生成所需的输出: