从 pandas 中的不规则时间序列生成规则时间序列

Generating regular time series from irregular time series in pandas

我有一个数据分析任务,我想分析实时服务日志。你能帮我在 Pandas 中如何做到这一点吗?

我的初始数据框如下所示:

我想为每个服务名称生成时间序列,并据此做关联分析。

如何通过聚合各自的数据,将每个服务名称的数据帧分成不同的数据帧(用时隙索引),如下所示?

Ps:我见过类似的问题,但我相信我的问题是不同的,因为我想从数据帧生成许多时间序列。如果这很简单,请提前抱歉,我是 Pandas 的新手 :)

这是我的数据框代码:

                ERRORCODE   ERRORTEXT   SERVICENAME REQTDURATION    RESPTDURATION   HOSTDURATION

10:00:27:000        NaN        NaN      serviceA       0              1             4612    
10:00:27:822        NaN        NaN      serviceB       0              1             14994   
10:01:27:622        -1        'Timeout' serviceA       1              0             7695    
10:01:27:323        NaN        NaN      serviceD       0              1             2612
10:01:27:755        NaN        NaN      serviceA       0              1             1612
10:02:27:666        -5        'Timeout' serviceA       0              1             11612
10:02:27:111        NaN        NaN      serviceB       0              1             111112
10:02:27:333        NaN        NaN      serviceC       0              1             412

开始于:

                 ERRORCODE  ERRORTEXT SERVICENAME  REQTDURATION  RESPTDURATION  \
10:00:27:000        NaN        NaN    serviceA             0              1   
10:00:27:822        NaN        NaN    serviceB             0              1   
10:01:27:622         -1  'Timeout'    serviceA             1              0   
10:01:27:323        NaN        NaN    serviceD             0              1   
10:01:27:755        NaN        NaN    serviceA             0              1   
10:02:27:666         -5  'Timeout'    serviceA             0              1   
10:02:27:111        NaN        NaN    serviceB             0              1   
10:02:27:333        NaN        NaN    serviceC             0              1   

              HOSTDURATION  
10:00:27:000          4612  
10:00:27:822         14994  
10:01:27:622          7695  
10:01:27:323          2612  
10:01:27:755          1612  
10:02:27:666         11612  
10:02:27:111        111112  
10:02:27:333           412 

正在将 index 转换为 DateTimeIndex:

df.index = pd.to_datetime(df.index, format='%H:%M:%S:%f')

然后遍历 SERVICENAME 个组:

for service, data in df.groupby('SERVICENAME'):
    service_result = pd.concat([data.groupby(pd.TimeGrouper('Min')).size(), data.groupby(pd.TimeGrouper('Min'))['REQTDURATION', 'RESPTDURATION', 'HOSTDURATION'].mean()], axis=1)
    service_result.columns = ['ERRORCOUNT', 'AVGREQTURATION', 'AVGRESPTDURATION', 'AVGHOSTDURATION']
    service_result.index = service_result.index.time

产量:

serviceA

          ERRORCOUNT  AVGREQTURATION  AVGRESPTDURATION  AVGHOSTDURATION
10:00:00           1             0.0               1.0           4612.0
10:01:00           2             0.5               0.5           4653.5
10:02:00           1             0.0               1.0          11612.0

 serviceB
          ERRORCOUNT  AVGREQTURATION  AVGRESPTDURATION  AVGHOSTDURATION
10:00:00           1               0                 1            14994
10:01:00           0             NaN               NaN              NaN
10:02:00           1               0                 1           111112

 serviceC
          ERRORCOUNT  AVGREQTURATION  AVGRESPTDURATION  AVGHOSTDURATION
10:02:00           1               0                 1              412

 serviceD
          ERRORCOUNT  AVGREQTURATION  AVGRESPTDURATION  AVGHOSTDURATION
10:01:00           1               0                 1             2612