为时间戳列创建 Bin

Creating Bin for timestamp column

我正在尝试为时间戳间隔列创建合适的 bin,

使用

等代码
df['Bin'] = pd.cut(df['interval_length'], bins=pd.to_timedelta(['00:00:00','00:10:00','00:20:00','00:30:00','00:40:00','00:50:00','00:60:00']))

生成的 df 如下所示:

time_interval  |           bin
  00:17:00        (0 days 00:10:00, 0 days 00:20:00]
  01:42:00                NaN
  00:15:00        (0 days 00:10:00, 0 days 00:20:00]
  00:00:00                NaN
  00:06:00        (0 days 00:00:00, 0 days 00:10:00]

这有点偏离,因为我想要的结果只是时间值而不是天数,而且我希望上限或最后一个 bin 为 60 分钟或 inf(或更多)

期望输出:

time_interval  |           bin
      00:17:00        (00:10:00,00:20:00]
      01:42:00        (00:60:00,inf]
      00:15:00        (00:10:00,00:20:00]
      00:00:00        (00:00:00,00:10:00]
      00:06:00        (00:00:00,00:10:00]

感谢收看!

在 pandas inf 中,时间增量不存在,因此使用最大值。如果想要由 timedeltas 填充的箱子,还包括最低值,使用参数 include_lowest=True

b = pd.to_timedelta(['00:00:00','00:10:00','00:20:00',
                     '00:30:00','00:40:00',
                     '00:50:00','00:60:00'])
b = b.append(pd.Index([pd.Timedelta.max]))
df['Bin'] = pd.cut(df['time_interval'],  include_lowest=True, bins=b)
print (df)
  time_interval                                             Bin
0      00:17:00              (0 days 00:10:00, 0 days 00:20:00]
1      01:42:00  (0 days 01:00:00, 106751 days 23:47:16.854775]
2      00:15:00              (0 days 00:10:00, 0 days 00:20:00]
3      00:00:00     (-1 days +23:59:59.999999, 0 days 00:10:00]
4      00:06:00     (-1 days +23:59:59.999999, 0 days 00:10:00]

如果想要字符串而不是时间增量,请使用 zip 来创建带有附加 'inf':

的标签
vals = ['00:00:00','00:10:00','00:20:00',
        '00:30:00','00:40:00', '00:50:00','00:60:00']

b = pd.to_timedelta(vals).append(pd.Index([pd.Timedelta.max]))

vals.append('inf')
labels = ['{}-{}'.format(i, j) for i, j in zip(vals[:-1], vals[1:])] 

df['Bin'] = pd.cut(df['time_interval'],  include_lowest=True, bins=b, labels=labels)
print (df)
  time_interval                Bin
0      00:17:00  00:10:00-00:20:00
1      01:42:00       00:60:00-inf
2      00:15:00  00:10:00-00:20:00
3      00:00:00  00:00:00-00:10:00
4      00:06:00  00:00:00-00:10:00

你可以用标签来解决它 -

df['Bin'] = pd.cut(df['interval_length'], bins=pd.to_timedelta(['00:00:00','00:10:00','00:20:00','00:30:00','00:40:00','00:50:00','00:60:00', '24:00:00']), labels=['(00:00:00,00:10:00]', '(00:10:00,00:20:00]', '(00:20:00,00:30:00]', '(00:30:00,00:40:00]', '(00:40:00,00:50:00]', '(00:50:00,00:60:00]', '(00:60:00,inf]'])