根据前一行更改 timedelta 列的值

Change values of a timedelta column based on the previous row

让它成为以下Python熊猫数据框:

code visit_time flag other counter
0 NaT True X 3
0 1 days 03:00:12 False Y 1
0 NaT False X 3
0 0 days 05:00:00 True X 2
1 NaT False Z 3
1 NaT True X 3
1 1 days 03:00:12 False Y 1
2 NaT True X 3
2 5 days 10:01:12 True Y 0

要解决这个问题,只需要列:code, visit_timeflag

每一行的值为 visit_time,前一行的值为 NaT。知道这一点,我想在数据框中做下一个修改:

示例:

code visit_time flag other counter
0 NaT True X 3
0 1 days 03:00:12 True Y 1
0 NaT False X 3
0 0 days 05:00:00 False X 2
1 NaT False Z 3
1 NaT True X 3
1 1 days 03:00:12 True Y 1
2 NaT True X 3
2 5 days 10:01:12 True Y 0

感谢提前提供的帮助。

for row in df.iterrows():
    if row[0] < df.shape[0] - 1:  # stop comparing when getting to last row
        if df.at[row[0], 'visit_time'] == 'NaT' and df.at[row[0] + 1, 'visit_time'] != 'NaT':
            df.at[row[0] + 1, 'flag'] = df.at[row[0], 'flag']

之前:

   code       visit_time   flag other  counter
0     0              NaT   True     X        3
1     0  1 days 03:00:12  False     Y        1
2     0              NaT  False     X        3
3     0  0 days 05:00:00   True     X        2
4     1              NaT  False     Z        3
5     1              NaT   True     X        3
6     1  1 days 03:00:12  False     Y        1
7     2              NaT   True     X        3
8     2  5 days 10:01:12   True     Y        0

之后:

   code       visit_time   flag other  counter
0     0              NaT   True     X        3
1     0  1 days 03:00:12   True     Y        1
2     0              NaT  False     X        3
3     0  0 days 05:00:00  False     X        2
4     1              NaT  False     Z        3
5     1              NaT   True     X        3
6     1  1 days 03:00:12   True     Y        1
7     2              NaT   True     X        3
8     2  5 days 10:01:12   True     Y        0

您可以简单地使用一个移位的数据框,如下所示:

df_previous = df.copy()
df_previous.index+=1

看起来像:

code visit_time flag other counter
1 0 NaT True X 3
2 0 1 days 03:00:12 False Y 1
3 0 NaT False X 3
4 0 0 days 05:00:00 True X 2
5 1 NaT False Z 3
6 1 NaT True X 3
7 1 1 days 03:00:12 False Y 1
8 2 NaT True X 3
9 2 5 days 10:01:12 True Y 0

现在您可以将其与原始数据框合并,并通过简单的向量比较来赋值:

df = df.merge(df_previous[['visit_time', 'flag']], right_index=True, left_index=True, how='left', suffixes=["",'_previous'])
df.loc[df.visit_time.notna(), 'flag'] = df.loc[df.visit_time.notna(), 'flag_previous']

现在你的数据框看起来像:

code visit_time flag other counter visit_time_previous flag_previous
0 0 NaT True X 3 nan nan
1 0 1 days 03:00:12 True Y 1 NaT 1
2 0 NaT False X 3 1 days 03:00:12 0
3 0 0 days 05:00:00 False X 2 NaT 0
4 1 NaT False Z 3 0 days 05:00:00 1
5 1 NaT True X 3 NaT 0
6 1 1 days 03:00:12 True Y 1 NaT 1
7 2 NaT True X 3 1 days 03:00:12 0
8 2 5 days 10:01:12 True Y 0 NaT 1

如果您愿意,也可以删除之前的列:

df.drop(list(df.filter(regex = '_previous')), axis = 1)

你会得到:

code visit_time flag other counter
0 0 NaT True X 3
1 0 1 days 03:00:12 True Y 1
2 0 NaT False X 3
3 0 0 days 05:00:00 False X 2
4 1 NaT False Z 3
5 1 NaT True X 3
6 1 1 days 03:00:12 True Y 1
7 2 NaT True X 3
8 2 5 days 10:01:12 True Y 0

您可以使用 .mask'flag' 值设置为自身的 .shifted 版本,其中 'visit_time' 值为 notnull.

out = df.assign(
    flag=df['flag'].mask(df['visit_time'].notnull(), df['flag'].shift())
)

print(out)
   code      visit_time   flag other  counter
0     0             NaT   True     X        3
1     0 1 days 03:00:12   True     Y        1
2     0             NaT  False     X        3
3     0 0 days 05:00:00  False     X        2
4     1             NaT  False     Z        3
5     1             NaT   True     X        3
6     1 1 days 03:00:12   True     Y        1
7     2             NaT   True     X        3
8     2 5 days 10:01:12   True     Y        0
  • .mask(condition, other) 将条件为 True 的值替换为 other 的值,在这种情况下 other 是上一行的值。
  • .assign(…) 是一种在返回新的 DataFrame 时更新列的方法,这可以用列赋值 df['flag'] = df['flag'].where(…) 替换以修改 DataFrame

从字符串变量创建列。

df[name] = df[name].mask(df['visit_time'].notnull(), df[name].shift()))