在函数 python 中将当前行时间戳与前一行的条件进行比较
Compare current row timestamp with previous row with condition in function python
我有一个 df sample
,其中一列名为 date_code
,dtype datetime64[ns]
:
date_code
2022-03-28
2022-03-29
2022-03-30
2022-03-31
2022-04-01
2022-04-07
2022-04-07
2022-04-08
2022-04-12
2022-04-12
2022-04-14
2022-04-14
2022-04-15
2022-04-16
2022-04-16
2022-04-17
2022-04-18
2022-04-19
2022-04-20
2022-04-20
2022-04-21
2022-04-22
2022-04-25
2022-04-25
2022-04-26
我想根据当前行与上一行的比较条件创建一个列。我试图创建一个像这样的函数:
def start_date(row):
if (row['date_code'] - row['date_code'].shift(-1)).days >1:
val = row['date_code'].shift(-1)
elif row['date_code'] == row['date_code'].shift(-1):
val = row['date_code']
else:
val = np.nan()
return val
但是一旦我申请了
sample['date_zero_recorded'] = sample.apply(start_date, axis=1)
我收到错误:
AttributeError: 'Timestamp' object has no attribute 'shift'
我应该如何比较当前行与前一行的条件?
已编辑:预期输出
如果当前行比上一行多 2 或更多,则获取上一行
如果当前行等于过去,则获取当前行
否则,return NaN(包括如果当前 >1 比以前)
date_code date_zero_recorded
2022-03-28 NaN
2022-03-29 NaN
2022-03-30 NaN
2022-03-31 NaN
2022-04-01 NaN
2022-04-07 2022-04-01
2022-04-07 2022-04-07
2022-04-08 NaN
2022-04-12 2022-04-08
2022-04-12 2022-04-12
2022-04-14 2022-04-12
2022-04-14 2022-04-14
2022-04-15 NaN
2022-04-16 NaN
2022-04-16 2022-04-16
2022-04-17 NaN
2022-04-18 NaN
2022-04-19 NaN
2022-04-20 NaN
2022-04-20 2022-04-20
2022-04-21 NaN
2022-04-22 NaN
2022-04-25 2022-04-22
2022-04-25 2022-04-25
2022-04-26 NaN
你不应该使用 iterrows
而应该使用矢量代码。
例如:
sample['date_code'] = pd.to_datetime(sample['date_code'])
sample['date_zero_recorded'] = (
sample['date_code'].shift()
.where(sample['date_code'].diff().ne('1d'))
)
输出:
date_code date_zero_recorded
0 2022-03-28 NaT
1 2022-03-29 NaT
2 2022-03-30 NaT
3 2022-03-31 NaT
4 2022-04-01 NaT
5 2022-04-07 2022-04-01
6 2022-04-07 2022-04-07
7 2022-04-08 NaT
8 2022-04-12 2022-04-08
9 2022-04-12 2022-04-12
10 2022-04-14 2022-04-12
11 2022-04-14 2022-04-14
12 2022-04-15 NaT
13 2022-04-16 NaT
14 2022-04-16 2022-04-16
15 2022-04-17 NaT
16 2022-04-18 NaT
17 2022-04-19 NaT
18 2022-04-20 NaT
19 2022-04-20 2022-04-20
20 2022-04-21 NaT
21 2022-04-22 NaT
22 2022-04-25 2022-04-22
23 2022-04-25 2022-04-25
24 2022-04-26 NaT
我有一个 df sample
,其中一列名为 date_code
,dtype datetime64[ns]
:
date_code
2022-03-28
2022-03-29
2022-03-30
2022-03-31
2022-04-01
2022-04-07
2022-04-07
2022-04-08
2022-04-12
2022-04-12
2022-04-14
2022-04-14
2022-04-15
2022-04-16
2022-04-16
2022-04-17
2022-04-18
2022-04-19
2022-04-20
2022-04-20
2022-04-21
2022-04-22
2022-04-25
2022-04-25
2022-04-26
我想根据当前行与上一行的比较条件创建一个列。我试图创建一个像这样的函数:
def start_date(row):
if (row['date_code'] - row['date_code'].shift(-1)).days >1:
val = row['date_code'].shift(-1)
elif row['date_code'] == row['date_code'].shift(-1):
val = row['date_code']
else:
val = np.nan()
return val
但是一旦我申请了
sample['date_zero_recorded'] = sample.apply(start_date, axis=1)
我收到错误:
AttributeError: 'Timestamp' object has no attribute 'shift'
我应该如何比较当前行与前一行的条件?
已编辑:预期输出
如果当前行比上一行多 2 或更多,则获取上一行
如果当前行等于过去,则获取当前行
否则,return NaN(包括如果当前 >1 比以前)
date_code date_zero_recorded
2022-03-28 NaN
2022-03-29 NaN
2022-03-30 NaN
2022-03-31 NaN
2022-04-01 NaN
2022-04-07 2022-04-01
2022-04-07 2022-04-07
2022-04-08 NaN
2022-04-12 2022-04-08
2022-04-12 2022-04-12
2022-04-14 2022-04-12
2022-04-14 2022-04-14
2022-04-15 NaN
2022-04-16 NaN
2022-04-16 2022-04-16
2022-04-17 NaN
2022-04-18 NaN
2022-04-19 NaN
2022-04-20 NaN
2022-04-20 2022-04-20
2022-04-21 NaN
2022-04-22 NaN
2022-04-25 2022-04-22
2022-04-25 2022-04-25
2022-04-26 NaN
你不应该使用 iterrows
而应该使用矢量代码。
例如:
sample['date_code'] = pd.to_datetime(sample['date_code'])
sample['date_zero_recorded'] = (
sample['date_code'].shift()
.where(sample['date_code'].diff().ne('1d'))
)
输出:
date_code date_zero_recorded
0 2022-03-28 NaT
1 2022-03-29 NaT
2 2022-03-30 NaT
3 2022-03-31 NaT
4 2022-04-01 NaT
5 2022-04-07 2022-04-01
6 2022-04-07 2022-04-07
7 2022-04-08 NaT
8 2022-04-12 2022-04-08
9 2022-04-12 2022-04-12
10 2022-04-14 2022-04-12
11 2022-04-14 2022-04-14
12 2022-04-15 NaT
13 2022-04-16 NaT
14 2022-04-16 2022-04-16
15 2022-04-17 NaT
16 2022-04-18 NaT
17 2022-04-19 NaT
18 2022-04-20 NaT
19 2022-04-20 2022-04-20
20 2022-04-21 NaT
21 2022-04-22 NaT
22 2022-04-25 2022-04-22
23 2022-04-25 2022-04-25
24 2022-04-26 NaT