Dataframe 使用 GroupBy 按时间数据重新采样
Dataframe Resample with GroupBy on time data
每秒的交通数据显示进出车辆的数量。我想按 In/Out 将它们汇总为 2 分钟并显示它们的总数,例如:
import pandas as pd
data = {'time': ["13:34:16","13:34:19","13:34:52","13:34:55","13:34:58","13:35:01","13:35:04","13:35:37","13:35:40","13:35:43","13:36:37","13:36:39","13:36:43","13:36:46","13:36:49","13:36:52","13:36:58","13:37:04","13:37:07","13:37:13","13:37:46","13:37:49","13:37:58",],
'cars' : [15,22,12,1,331,32,14,5,51,13,3,22,5,2,4,1,3,5,89,105,1,63,1,],
'flow': ["In","Out","In","Unknown","Out","In","Out","Unknown","Out","Out","In","In","Unknown","In","In","Out","In","In","In","In","In","In","In",]}
我试过了:
df = pd.DataFrame(data)
df.time = '2020-01-23 ' + df.time # data date
df.time = pd.to_datetime(df.time, unit='s')
print (df.groupby('flow').resample('2T')['cars'].sum())
但是报错:
ValueError: non convertible value 2020-01-23 13:34:16 with the unit 's'
正确的方法是什么?
我认为您应该对索引重新采样。你能试试吗:
df.time = pd.to_datetime(df.time)
df.set_index("time").groupby('flow').resample('2T')['cars'].sum()
flow time
In 2020-01-23 13:34:00 59
2020-01-23 13:36:00 298
Out 2020-01-23 13:34:00 431
2020-01-23 13:36:00 1
Unknown 2020-01-23 13:34:00 6
2020-01-23 13:36:00 5
Name: cars, dtype: int64
如果您想复制您的 excel:
df_new = df_new.unstack().T
df_new["Total"] =df_new.sum(axis=1)
print(df_new)
flow In Out Unknown Total
time
2020-01-23 13:34:00 59 431 6 496
2020-01-23 13:36:00 298 1 5 304
每秒的交通数据显示进出车辆的数量。我想按 In/Out 将它们汇总为 2 分钟并显示它们的总数,例如:
import pandas as pd
data = {'time': ["13:34:16","13:34:19","13:34:52","13:34:55","13:34:58","13:35:01","13:35:04","13:35:37","13:35:40","13:35:43","13:36:37","13:36:39","13:36:43","13:36:46","13:36:49","13:36:52","13:36:58","13:37:04","13:37:07","13:37:13","13:37:46","13:37:49","13:37:58",],
'cars' : [15,22,12,1,331,32,14,5,51,13,3,22,5,2,4,1,3,5,89,105,1,63,1,],
'flow': ["In","Out","In","Unknown","Out","In","Out","Unknown","Out","Out","In","In","Unknown","In","In","Out","In","In","In","In","In","In","In",]}
我试过了:
df = pd.DataFrame(data)
df.time = '2020-01-23 ' + df.time # data date
df.time = pd.to_datetime(df.time, unit='s')
print (df.groupby('flow').resample('2T')['cars'].sum())
但是报错:
ValueError: non convertible value 2020-01-23 13:34:16 with the unit 's'
正确的方法是什么?
我认为您应该对索引重新采样。你能试试吗:
df.time = pd.to_datetime(df.time)
df.set_index("time").groupby('flow').resample('2T')['cars'].sum()
flow time
In 2020-01-23 13:34:00 59
2020-01-23 13:36:00 298
Out 2020-01-23 13:34:00 431
2020-01-23 13:36:00 1
Unknown 2020-01-23 13:34:00 6
2020-01-23 13:36:00 5
Name: cars, dtype: int64
如果您想复制您的 excel:
df_new = df_new.unstack().T
df_new["Total"] =df_new.sum(axis=1)
print(df_new)
flow In Out Unknown Total
time
2020-01-23 13:34:00 59 431 6 496
2020-01-23 13:36:00 298 1 5 304