获取索引大于 idxmax() 的组中的行并为其赋值

Get and assign value to rows in group with index greater than those from idxmax()

objective是给组中任何一个比从idxmax()

中检索到的值更高的索引分配1s
import numpy as np
import pandas as pd
df = pd.DataFrame({'id':[1, 1, 1, 2, 2, 2, 3, 3, 3], 'val':[1,np.NaN, 0, np.NaN, 1, 0, 1, 0, 0]})

   id  val
0   1  1.0
1   1  NaN
2   1  0.0
3   2  NaN
4   2  1.0
5   2  0.0
6   3  1.0
7   3  0.0
8   3  0.0

我们可以使用 idxmax() 来获取每组中最大值的索引值

test = df.groupby('id')['val'].idxmax()

id
1    0
2    4
3    6

objective 是将数据转换成这样(即组中索引高于 idxmax() 的值的每个值都被分配为 1。

   id  val
0   1  1.0
1   1  1.0
2   1  1.0
3   2  NaN
4   2  1.0
5   2  1.0
6   3  1.0
7   3  1.0
8   3  1.0

这道题不一定需要用idxmax()来做。欢迎任何建议。

如果我理解正确的话,你可以使用applynp.where

nd = df.groupby('id')['val'].idxmax().tolist()
df['val'] = df.groupby('id')['val'].transform(lambda x: np.where(x.index>nd[x.name-1], 1, x))

df

Output:

    id  val
0   1   1.0
1   1   1.0
2   1   1.0
3   2   NaN
4   2   1.0
5   2   1.0
6   3   1.0
7   3   1.0
8   3   1.0

尝试

df = pd.DataFrame({'id':[1, 1, 1, 2, 2, 2, 3, 3, 3], 'val':[1,np.NaN, 0, np.NaN, 1, 0, 1, 0, 0]})

# cummax fills everything after the first True to True in each group
# mask replaces the 0s that were originally nan by nan
df.val = df.val.eq(1).groupby(df.id).cummax().astype(int).mask(lambda x: x.eq(0) & df.val.isna())
df