在函数 python 中将当前行时间戳与前一行的条件进行比较

Compare current row timestamp with previous row with condition in function python

我有一个 df sample,其中一列名为 date_code,dtype datetime64[ns]:

date_code
2022-03-28
2022-03-29
2022-03-30
2022-03-31
2022-04-01
2022-04-07
2022-04-07
2022-04-08
2022-04-12
2022-04-12
2022-04-14
2022-04-14
2022-04-15
2022-04-16
2022-04-16
2022-04-17
2022-04-18
2022-04-19
2022-04-20
2022-04-20
2022-04-21
2022-04-22
2022-04-25
2022-04-25
2022-04-26

我想根据当前行与上一行的比较条件创建一个列。我试图创建一个像这样的函数:

def start_date(row):
    if (row['date_code'] - row['date_code'].shift(-1)).days >1:
        val = row['date_code'].shift(-1)
    elif row['date_code'] == row['date_code'].shift(-1):
        val = row['date_code']
    else:
        val = np.nan()
    return val

但是一旦我申请了

sample['date_zero_recorded'] = sample.apply(start_date, axis=1)

我收到错误:

AttributeError: 'Timestamp' object has no attribute 'shift'

我应该如何比较当前行与前一行的条件?

已编辑:预期输出

如果当前行比上一行多 2 或更多,则获取上一行

如果当前行等于过去,则获取当前行

否则,return NaN(包括如果当前 >1 比以前)

date_code   date_zero_recorded
2022-03-28  NaN
2022-03-29  NaN
2022-03-30  NaN
2022-03-31  NaN
2022-04-01  NaN
2022-04-07  2022-04-01
2022-04-07  2022-04-07
2022-04-08  NaN
2022-04-12  2022-04-08
2022-04-12  2022-04-12
2022-04-14  2022-04-12
2022-04-14  2022-04-14
2022-04-15  NaN
2022-04-16  NaN
2022-04-16  2022-04-16
2022-04-17  NaN
2022-04-18  NaN
2022-04-19  NaN
2022-04-20  NaN
2022-04-20  2022-04-20
2022-04-21  NaN
2022-04-22  NaN
2022-04-25  2022-04-22
2022-04-25  2022-04-25
2022-04-26  NaN

你不应该使用 iterrows 而应该使用矢量代码。

例如:

sample['date_code'] = pd.to_datetime(sample['date_code'])

sample['date_zero_recorded'] = (
 sample['date_code'].shift()
 .where(sample['date_code'].diff().ne('1d'))
)

输出:

    date_code date_zero_recorded
0  2022-03-28                NaT
1  2022-03-29                NaT
2  2022-03-30                NaT
3  2022-03-31                NaT
4  2022-04-01                NaT
5  2022-04-07         2022-04-01
6  2022-04-07         2022-04-07
7  2022-04-08                NaT
8  2022-04-12         2022-04-08
9  2022-04-12         2022-04-12
10 2022-04-14         2022-04-12
11 2022-04-14         2022-04-14
12 2022-04-15                NaT
13 2022-04-16                NaT
14 2022-04-16         2022-04-16
15 2022-04-17                NaT
16 2022-04-18                NaT
17 2022-04-19                NaT
18 2022-04-20                NaT
19 2022-04-20         2022-04-20
20 2022-04-21                NaT
21 2022-04-22                NaT
22 2022-04-25         2022-04-22
23 2022-04-25         2022-04-25
24 2022-04-26                NaT