Pandas:无法减去日期时间对象(timedelta、datetime)
Pandas: Cannot subtract date-time objects (timedelta, datetime)
设置如下:
Python 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:00:30)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.22.0 -- An enhanced Interactive Python. Type '?' for help.
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
df = pd.DataFrame({
'user_id': [1,2,3,4,5,6],
'created_at': [
'2017-01-01 10:10:15',
'2017-01-01 11:11:11',
'2017-01-01 12:12:12',
'2017-01-01 10:10:20',
'2017-01-01 10:10:34',
'2017-01-01 11:11:21'],
'transaction_value': [10, 20, 10, 30, 40, 50]
})
# convert string to datetime obj
df['created_at'] = pd.to_datetime(df['created_at'])
# convert other columns to numeric
cols = df.columns.drop('created_at')
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
# creating lag1 and lag2
df['lag1'] = (
df.sort_values(by=['created_at'], ascending=True)['created_at']
.shift(periods=1, axis=0).fillna(0)
)
df['lag2'] = (
df.sort_values(by=['created_at'], ascending=True)['created_at']
.shift(periods=-1, axis=0).fillna(0)
)
# 0's to NaN
df = df.replace(0, np.nan, inplace=False)
# convert to datetime
cols = [col for col in df if col.startswith('lag')]
df[cols] = df[cols].apply(pd.to_datetime, errors='coerce')
Out[62]:
user_id created_at transaction_value lag1 lag2
0 1 2017-01-01 10:10:15 10 NaT NaT
1 2 2017-01-01 11:11:11 20 2017-01-01 10:10:34 2017-01-01 10:10:34
2 3 2017-01-01 12:12:12 10 2017-01-01 11:11:21 2017-01-01 11:11:21
3 4 2017-01-01 10:10:20 30 2017-01-01 10:10:15 2017-01-01 10:10:15
4 5 2017-01-01 10:10:34 40 2017-01-01 10:10:20 2017-01-01 10:10:20
5 6 2017-01-01 11:11:21 50 2017-01-01 11:11:11 2017-01-01 11:11:11
In [63]: df.dtypes
Out[63]:
user_id int64
created_at datetime64[ns]
transaction_value int64
lag1 datetime64[ns]
lag2 datetime64[ns]
dtype: object
我想要所有时间戳列之间的差异(结果以秒为单位)。
以下是我尝试过的许多方法:
尝试 #1:
def x(a,b):
return timedelta(a - b).total_seconds()
df.apply(lambda f: x(f['created_at'],f['lag1']), axis=1)
TypeError: unsupported type for timedelta days component: NaTType
In [69]:
好的,尝试 #2:
pd.Timedelta(df['lag1'].difference(df['lag2']))
AttributeError: 'Series' object has no attribute 'difference'
好的...尝试#3:
pd.Timedelta(df['lag1'].subtract(df['lag2']).to_seconds())
AttributeError: 'Series' object has no attribute 'to_seconds'
现在我只是到处乱扔东西,看看会粘住什么,因为这对我来说没有任何意义:
df['lag1'].subtract(df['lag2']).to_timedelta64
AttributeError: 'Series' object has no attribute 'to_timedelta64'
t1 = df['lag1']
t2 = df['lag2']
pd.Timedelta(t2 - t1).seconds
ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible, not Series
我不应该写一段代码来获得两个日期时间之间的差异
- pandas = 1.2.3(conda-forge)
- numpy = 1.20.2 (conda-forge)
- 日期时间 = 4.3 (pypi_0)
我所在的机器:
MacBook Air M1 2020 16GB RAM(macOS Big Sur 版本 11.2.1)
由于两列都是 pandas Timestamp
,您可以这样做:
def x(a, b):
return (a - b).total_seconds()
df.apply(lambda f: x(f['created_at'],f['lag1']), axis=1)
我不知道你想在 NaT
情况下做什么(这个功能 returns NaN
),但你可以很容易地改变它。
我没有接触过NaT values
,如果需要,请随意用0或其他值填充它们。
我们可以使用 pd.timedelta 和 dt
访问器,然后应用 total_seconds
方法。
如果要求输出为 int type
,则在代码末尾添加 .astype(int)
。
代码
df['lag_diff'] = pd.to_timedelta(df.lag1 - df.lag2, unit='s').dt.total_seconds()
来自提供设置的输入
user_id created_at transaction_value lag1 lag2
0 1 2017-01-01 10:10:15 10 NaT 2017-01-01 10:10:20
1 2 2017-01-01 11:11:11 20 2017-01-01 10:10:34 2017-01-01 11:11:21
2 3 2017-01-01 12:12:12 10 2017-01-01 11:11:21 NaT
3 4 2017-01-01 10:10:20 30 2017-01-01 10:10:15 2017-01-01 10:10:34
4 5 2017-01-01 10:10:34 40 2017-01-01 10:10:20 2017-01-01 11:11:11
5 6 2017-01-01 11:11:21 50 2017-01-01 11:11:11 2017-01-01 12:12:12
列的输出子集
lag1 lag2 lag_diff
0 NaT 2017-01-01 10:10:20 NaN
1 2017-01-01 10:10:34 2017-01-01 11:11:21 -3647.0
2 2017-01-01 11:11:21 NaT NaN
3 2017-01-01 10:10:15 2017-01-01 10:10:34 -19.0
4 2017-01-01 10:10:20 2017-01-01 11:11:11 -3651.0
5 2017-01-01 11:11:11 2017-01-01 12:12:12 -3661.0
设置如下:
Python 3.9.2 | packaged by conda-forge | (default, Feb 21 2021, 05:00:30)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.22.0 -- An enhanced Interactive Python. Type '?' for help.
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
df = pd.DataFrame({
'user_id': [1,2,3,4,5,6],
'created_at': [
'2017-01-01 10:10:15',
'2017-01-01 11:11:11',
'2017-01-01 12:12:12',
'2017-01-01 10:10:20',
'2017-01-01 10:10:34',
'2017-01-01 11:11:21'],
'transaction_value': [10, 20, 10, 30, 40, 50]
})
# convert string to datetime obj
df['created_at'] = pd.to_datetime(df['created_at'])
# convert other columns to numeric
cols = df.columns.drop('created_at')
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
# creating lag1 and lag2
df['lag1'] = (
df.sort_values(by=['created_at'], ascending=True)['created_at']
.shift(periods=1, axis=0).fillna(0)
)
df['lag2'] = (
df.sort_values(by=['created_at'], ascending=True)['created_at']
.shift(periods=-1, axis=0).fillna(0)
)
# 0's to NaN
df = df.replace(0, np.nan, inplace=False)
# convert to datetime
cols = [col for col in df if col.startswith('lag')]
df[cols] = df[cols].apply(pd.to_datetime, errors='coerce')
Out[62]:
user_id created_at transaction_value lag1 lag2
0 1 2017-01-01 10:10:15 10 NaT NaT
1 2 2017-01-01 11:11:11 20 2017-01-01 10:10:34 2017-01-01 10:10:34
2 3 2017-01-01 12:12:12 10 2017-01-01 11:11:21 2017-01-01 11:11:21
3 4 2017-01-01 10:10:20 30 2017-01-01 10:10:15 2017-01-01 10:10:15
4 5 2017-01-01 10:10:34 40 2017-01-01 10:10:20 2017-01-01 10:10:20
5 6 2017-01-01 11:11:21 50 2017-01-01 11:11:11 2017-01-01 11:11:11
In [63]: df.dtypes
Out[63]:
user_id int64
created_at datetime64[ns]
transaction_value int64
lag1 datetime64[ns]
lag2 datetime64[ns]
dtype: object
我想要所有时间戳列之间的差异(结果以秒为单位)。
以下是我尝试过的许多方法:
尝试 #1:
def x(a,b):
return timedelta(a - b).total_seconds()
df.apply(lambda f: x(f['created_at'],f['lag1']), axis=1)
TypeError: unsupported type for timedelta days component: NaTType
In [69]:
好的,尝试 #2:
pd.Timedelta(df['lag1'].difference(df['lag2']))
AttributeError: 'Series' object has no attribute 'difference'
好的...尝试#3:
pd.Timedelta(df['lag1'].subtract(df['lag2']).to_seconds())
AttributeError: 'Series' object has no attribute 'to_seconds'
现在我只是到处乱扔东西,看看会粘住什么,因为这对我来说没有任何意义:
df['lag1'].subtract(df['lag2']).to_timedelta64
AttributeError: 'Series' object has no attribute 'to_timedelta64'
t1 = df['lag1']
t2 = df['lag2']
pd.Timedelta(t2 - t1).seconds
ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible, not Series
我不应该写一段代码来获得两个日期时间之间的差异
- pandas = 1.2.3(conda-forge)
- numpy = 1.20.2 (conda-forge)
- 日期时间 = 4.3 (pypi_0)
我所在的机器:
MacBook Air M1 2020 16GB RAM(macOS Big Sur 版本 11.2.1)
由于两列都是 pandas Timestamp
,您可以这样做:
def x(a, b):
return (a - b).total_seconds()
df.apply(lambda f: x(f['created_at'],f['lag1']), axis=1)
我不知道你想在 NaT
情况下做什么(这个功能 returns NaN
),但你可以很容易地改变它。
我没有接触过NaT values
,如果需要,请随意用0或其他值填充它们。
我们可以使用 pd.timedelta 和 dt
访问器,然后应用 total_seconds
方法。
如果要求输出为 int type
,则在代码末尾添加 .astype(int)
。
代码
df['lag_diff'] = pd.to_timedelta(df.lag1 - df.lag2, unit='s').dt.total_seconds()
来自提供设置的输入
user_id created_at transaction_value lag1 lag2
0 1 2017-01-01 10:10:15 10 NaT 2017-01-01 10:10:20
1 2 2017-01-01 11:11:11 20 2017-01-01 10:10:34 2017-01-01 11:11:21
2 3 2017-01-01 12:12:12 10 2017-01-01 11:11:21 NaT
3 4 2017-01-01 10:10:20 30 2017-01-01 10:10:15 2017-01-01 10:10:34
4 5 2017-01-01 10:10:34 40 2017-01-01 10:10:20 2017-01-01 11:11:11
5 6 2017-01-01 11:11:21 50 2017-01-01 11:11:11 2017-01-01 12:12:12
列的输出子集
lag1 lag2 lag_diff
0 NaT 2017-01-01 10:10:20 NaN
1 2017-01-01 10:10:34 2017-01-01 11:11:21 -3647.0
2 2017-01-01 11:11:21 NaT NaN
3 2017-01-01 10:10:15 2017-01-01 10:10:34 -19.0
4 2017-01-01 10:10:20 2017-01-01 11:11:11 -3651.0
5 2017-01-01 11:11:11 2017-01-01 12:12:12 -3661.0