Pandas: 从 timedelta 中提取小时
Pandas: extract hour from timedelta
在 Pandas 中解释了如何将整数转换为小时时间步长。我需要做相反的事情。
我的数据框df1
:
A
0 02:00:00
1 01:00:00
2 02:00:00
3 03:00:00
我预期的数据帧 df1
:
A B
0 02:00:00 2
1 01:00:00 1
2 02:00:00 2
3 03:00:00 3
我正在尝试的是:
df1['B'] = df1['A'].astype(int)
这失败了,因为:
TypeError: cannot astype a timedelta from [timedelta64[ns]] to [int32]
最好的方法是什么?
编辑
如果我尝试 df['B'] = df['A'].dt.hour
,那么我会得到:
AttributeError: 'TimedeltaProperties' object has no attribute 'hour'
除以np.timedelta64(1, 'h')
:
df1['B'] = df1['A'] / np.timedelta64(1, 'h')
print (df1)
A B
0 02:00:00 2.0
1 01:00:00 1.0
2 02:00:00 2.0
3 03:00:00 3.0
您可以使用 dt.components
并访问小时列:
In[7]:
df['B'] = df['A'].dt.components['hours']
df
Out[7]:
A B
0 02:00:00 2
1 01:00:00 1
2 02:00:00 2
3 03:00:00 3
时间增量组件returns每个组件作为一列:
In[8]:
df['A'].dt.components
Out[8]:
days hours minutes seconds milliseconds microseconds nanoseconds
0 0 2 0 0 0 0 0
1 0 1 0 0 0 0 0
2 0 2 0 0 0 0 0
3 0 3 0 0 0 0 0
两种解决方案 - dt.components 或 np.timedelta64 - 都很有用。但是 np.timedelta64 是
(1) 比 dt.components 快得多(很高兴知道,尤其是对于大型数据帧)
(2,正如@Sam Chats 指出的那样)还考虑了天数差异。
时间对比:
import pandas as pd
import numpy as np
dct = {
'date1': ['08:05:23', '18:07:20', '08:05:23'],
'date2': ['09:15:24', '22:07:20', '08:54:01']
}
df = pd.DataFrame(dct)
df['date1'] = pd.to_datetime(df['date1'], format='%H:%M:%S')
df['date2'] = pd.to_datetime(df['date2'], format='%H:%M:%S')
df['delta'] = df['date2']-df['date1']
%timeit df['np_h'] = (df['delta'] / np.timedelta64(1,'h')).astype(int)
%timeit df['td_h'] = df['delta'].dt.components['hours']
Output:
1000 loops, best of 3: 484 µs per loop
1000 loops, best of 3: 1.43 ms per loop
或者除以 pd.Timedelta(1, 'h')
:
df1['B'] = df1['A'] / pd.Timedelta(1, 'h')
结果为浮点数。
https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.html
我的数据框df1
:
A
0 02:00:00
1 01:00:00
2 02:00:00
3 03:00:00
我预期的数据帧 df1
:
A B
0 02:00:00 2
1 01:00:00 1
2 02:00:00 2
3 03:00:00 3
我正在尝试的是:
df1['B'] = df1['A'].astype(int)
这失败了,因为:
TypeError: cannot astype a timedelta from [timedelta64[ns]] to [int32]
最好的方法是什么?
编辑
如果我尝试 df['B'] = df['A'].dt.hour
,那么我会得到:
AttributeError: 'TimedeltaProperties' object has no attribute 'hour'
除以np.timedelta64(1, 'h')
:
df1['B'] = df1['A'] / np.timedelta64(1, 'h')
print (df1)
A B
0 02:00:00 2.0
1 01:00:00 1.0
2 02:00:00 2.0
3 03:00:00 3.0
您可以使用 dt.components
并访问小时列:
In[7]:
df['B'] = df['A'].dt.components['hours']
df
Out[7]:
A B
0 02:00:00 2
1 01:00:00 1
2 02:00:00 2
3 03:00:00 3
时间增量组件returns每个组件作为一列:
In[8]:
df['A'].dt.components
Out[8]:
days hours minutes seconds milliseconds microseconds nanoseconds
0 0 2 0 0 0 0 0
1 0 1 0 0 0 0 0
2 0 2 0 0 0 0 0
3 0 3 0 0 0 0 0
两种解决方案 - dt.components 或 np.timedelta64 - 都很有用。但是 np.timedelta64 是 (1) 比 dt.components 快得多(很高兴知道,尤其是对于大型数据帧) (2,正如@Sam Chats 指出的那样)还考虑了天数差异。
时间对比:
import pandas as pd
import numpy as np
dct = {
'date1': ['08:05:23', '18:07:20', '08:05:23'],
'date2': ['09:15:24', '22:07:20', '08:54:01']
}
df = pd.DataFrame(dct)
df['date1'] = pd.to_datetime(df['date1'], format='%H:%M:%S')
df['date2'] = pd.to_datetime(df['date2'], format='%H:%M:%S')
df['delta'] = df['date2']-df['date1']
%timeit df['np_h'] = (df['delta'] / np.timedelta64(1,'h')).astype(int)
%timeit df['td_h'] = df['delta'].dt.components['hours']
Output:
1000 loops, best of 3: 484 µs per loop
1000 loops, best of 3: 1.43 ms per loop
或者除以 pd.Timedelta(1, 'h')
:
df1['B'] = df1['A'] / pd.Timedelta(1, 'h')
结果为浮点数。
https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.html