Dataframe 使用 GroupBy 按时间数据重新采样

Dataframe Resample with GroupBy on time data

每秒的交通数据显示进出车辆的数量。我想按 In/Out 将它们汇总为 2 分钟并显示它们的总数,例如:

import pandas as pd

data = {'time': ["13:34:16","13:34:19","13:34:52","13:34:55","13:34:58","13:35:01","13:35:04","13:35:37","13:35:40","13:35:43","13:36:37","13:36:39","13:36:43","13:36:46","13:36:49","13:36:52","13:36:58","13:37:04","13:37:07","13:37:13","13:37:46","13:37:49","13:37:58",], 
'cars' : [15,22,12,1,331,32,14,5,51,13,3,22,5,2,4,1,3,5,89,105,1,63,1,],
'flow': ["In","Out","In","Unknown","Out","In","Out","Unknown","Out","Out","In","In","Unknown","In","In","Out","In","In","In","In","In","In","In",]}

我试过了:

df = pd.DataFrame(data)
df.time = '2020-01-23 ' + df.time     # data date

df.time = pd.to_datetime(df.time, unit='s')

print (df.groupby('flow').resample('2T')['cars'].sum())

但是报错:

ValueError: non convertible value 2020-01-23 13:34:16 with the unit 's'

正确的方法是什么?

我认为您应该对索引重新采样。你能试试吗:

df.time = pd.to_datetime(df.time)
df.set_index("time").groupby('flow').resample('2T')['cars'].sum()
flow     time               
In       2020-01-23 13:34:00     59
         2020-01-23 13:36:00    298
Out      2020-01-23 13:34:00    431
         2020-01-23 13:36:00      1
Unknown  2020-01-23 13:34:00      6
         2020-01-23 13:36:00      5
Name: cars, dtype: int64

如果您想复制您的 excel:

df_new = df_new.unstack().T
df_new["Total"] =df_new.sum(axis=1)
print(df_new)
flow                  In  Out  Unknown  Total
time                                         
2020-01-23 13:34:00   59  431        6    496
2020-01-23 13:36:00  298    1        5    304