从 pandas 中的不规则时间序列生成规则时间序列
Generating regular time series from irregular time series in pandas
我有一个数据分析任务,我想分析实时服务日志。你能帮我在 Pandas 中如何做到这一点吗?
我的初始数据框如下所示:
我想为每个服务名称生成时间序列,并据此做关联分析。
如何通过聚合各自的数据,将每个服务名称的数据帧分成不同的数据帧(用时隙索引),如下所示?
Ps:我见过类似的问题,但我相信我的问题是不同的,因为我想从数据帧生成许多时间序列。如果这很简单,请提前抱歉,我是 Pandas 的新手 :)
这是我的数据框代码:
ERRORCODE ERRORTEXT SERVICENAME REQTDURATION RESPTDURATION HOSTDURATION
10:00:27:000 NaN NaN serviceA 0 1 4612
10:00:27:822 NaN NaN serviceB 0 1 14994
10:01:27:622 -1 'Timeout' serviceA 1 0 7695
10:01:27:323 NaN NaN serviceD 0 1 2612
10:01:27:755 NaN NaN serviceA 0 1 1612
10:02:27:666 -5 'Timeout' serviceA 0 1 11612
10:02:27:111 NaN NaN serviceB 0 1 111112
10:02:27:333 NaN NaN serviceC 0 1 412
开始于:
ERRORCODE ERRORTEXT SERVICENAME REQTDURATION RESPTDURATION \
10:00:27:000 NaN NaN serviceA 0 1
10:00:27:822 NaN NaN serviceB 0 1
10:01:27:622 -1 'Timeout' serviceA 1 0
10:01:27:323 NaN NaN serviceD 0 1
10:01:27:755 NaN NaN serviceA 0 1
10:02:27:666 -5 'Timeout' serviceA 0 1
10:02:27:111 NaN NaN serviceB 0 1
10:02:27:333 NaN NaN serviceC 0 1
HOSTDURATION
10:00:27:000 4612
10:00:27:822 14994
10:01:27:622 7695
10:01:27:323 2612
10:01:27:755 1612
10:02:27:666 11612
10:02:27:111 111112
10:02:27:333 412
正在将 index
转换为 DateTimeIndex
:
df.index = pd.to_datetime(df.index, format='%H:%M:%S:%f')
然后遍历 SERVICENAME
个组:
for service, data in df.groupby('SERVICENAME'):
service_result = pd.concat([data.groupby(pd.TimeGrouper('Min')).size(), data.groupby(pd.TimeGrouper('Min'))['REQTDURATION', 'RESPTDURATION', 'HOSTDURATION'].mean()], axis=1)
service_result.columns = ['ERRORCOUNT', 'AVGREQTURATION', 'AVGRESPTDURATION', 'AVGHOSTDURATION']
service_result.index = service_result.index.time
产量:
serviceA
ERRORCOUNT AVGREQTURATION AVGRESPTDURATION AVGHOSTDURATION
10:00:00 1 0.0 1.0 4612.0
10:01:00 2 0.5 0.5 4653.5
10:02:00 1 0.0 1.0 11612.0
serviceB
ERRORCOUNT AVGREQTURATION AVGRESPTDURATION AVGHOSTDURATION
10:00:00 1 0 1 14994
10:01:00 0 NaN NaN NaN
10:02:00 1 0 1 111112
serviceC
ERRORCOUNT AVGREQTURATION AVGRESPTDURATION AVGHOSTDURATION
10:02:00 1 0 1 412
serviceD
ERRORCOUNT AVGREQTURATION AVGRESPTDURATION AVGHOSTDURATION
10:01:00 1 0 1 2612
我有一个数据分析任务,我想分析实时服务日志。你能帮我在 Pandas 中如何做到这一点吗?
我的初始数据框如下所示:
我想为每个服务名称生成时间序列,并据此做关联分析。
如何通过聚合各自的数据,将每个服务名称的数据帧分成不同的数据帧(用时隙索引),如下所示?
Ps:我见过类似的问题,但我相信我的问题是不同的,因为我想从数据帧生成许多时间序列。如果这很简单,请提前抱歉,我是 Pandas 的新手 :)
这是我的数据框代码:
ERRORCODE ERRORTEXT SERVICENAME REQTDURATION RESPTDURATION HOSTDURATION
10:00:27:000 NaN NaN serviceA 0 1 4612
10:00:27:822 NaN NaN serviceB 0 1 14994
10:01:27:622 -1 'Timeout' serviceA 1 0 7695
10:01:27:323 NaN NaN serviceD 0 1 2612
10:01:27:755 NaN NaN serviceA 0 1 1612
10:02:27:666 -5 'Timeout' serviceA 0 1 11612
10:02:27:111 NaN NaN serviceB 0 1 111112
10:02:27:333 NaN NaN serviceC 0 1 412
开始于:
ERRORCODE ERRORTEXT SERVICENAME REQTDURATION RESPTDURATION \
10:00:27:000 NaN NaN serviceA 0 1
10:00:27:822 NaN NaN serviceB 0 1
10:01:27:622 -1 'Timeout' serviceA 1 0
10:01:27:323 NaN NaN serviceD 0 1
10:01:27:755 NaN NaN serviceA 0 1
10:02:27:666 -5 'Timeout' serviceA 0 1
10:02:27:111 NaN NaN serviceB 0 1
10:02:27:333 NaN NaN serviceC 0 1
HOSTDURATION
10:00:27:000 4612
10:00:27:822 14994
10:01:27:622 7695
10:01:27:323 2612
10:01:27:755 1612
10:02:27:666 11612
10:02:27:111 111112
10:02:27:333 412
正在将 index
转换为 DateTimeIndex
:
df.index = pd.to_datetime(df.index, format='%H:%M:%S:%f')
然后遍历 SERVICENAME
个组:
for service, data in df.groupby('SERVICENAME'):
service_result = pd.concat([data.groupby(pd.TimeGrouper('Min')).size(), data.groupby(pd.TimeGrouper('Min'))['REQTDURATION', 'RESPTDURATION', 'HOSTDURATION'].mean()], axis=1)
service_result.columns = ['ERRORCOUNT', 'AVGREQTURATION', 'AVGRESPTDURATION', 'AVGHOSTDURATION']
service_result.index = service_result.index.time
产量:
serviceA
ERRORCOUNT AVGREQTURATION AVGRESPTDURATION AVGHOSTDURATION
10:00:00 1 0.0 1.0 4612.0
10:01:00 2 0.5 0.5 4653.5
10:02:00 1 0.0 1.0 11612.0
serviceB
ERRORCOUNT AVGREQTURATION AVGRESPTDURATION AVGHOSTDURATION
10:00:00 1 0 1 14994
10:01:00 0 NaN NaN NaN
10:02:00 1 0 1 111112
serviceC
ERRORCOUNT AVGREQTURATION AVGRESPTDURATION AVGHOSTDURATION
10:02:00 1 0 1 412
serviceD
ERRORCOUNT AVGREQTURATION AVGRESPTDURATION AVGHOSTDURATION
10:01:00 1 0 1 2612