获取索引大于 idxmax() 的组中的行并为其赋值
Get and assign value to rows in group with index greater than those from idxmax()
objective是给组中任何一个比从idxmax()
中检索到的值更高的索引分配1s
import numpy as np
import pandas as pd
df = pd.DataFrame({'id':[1, 1, 1, 2, 2, 2, 3, 3, 3], 'val':[1,np.NaN, 0, np.NaN, 1, 0, 1, 0, 0]})
id val
0 1 1.0
1 1 NaN
2 1 0.0
3 2 NaN
4 2 1.0
5 2 0.0
6 3 1.0
7 3 0.0
8 3 0.0
我们可以使用 idxmax() 来获取每组中最大值的索引值
test = df.groupby('id')['val'].idxmax()
id
1 0
2 4
3 6
objective 是将数据转换成这样(即组中索引高于 idxmax() 的值的每个值都被分配为 1。
id val
0 1 1.0
1 1 1.0
2 1 1.0
3 2 NaN
4 2 1.0
5 2 1.0
6 3 1.0
7 3 1.0
8 3 1.0
这道题不一定需要用idxmax()来做。欢迎任何建议。
如果我理解正确的话,你可以使用apply
和np.where
nd = df.groupby('id')['val'].idxmax().tolist()
df['val'] = df.groupby('id')['val'].transform(lambda x: np.where(x.index>nd[x.name-1], 1, x))
df
Output:
id val
0 1 1.0
1 1 1.0
2 1 1.0
3 2 NaN
4 2 1.0
5 2 1.0
6 3 1.0
7 3 1.0
8 3 1.0
尝试
df = pd.DataFrame({'id':[1, 1, 1, 2, 2, 2, 3, 3, 3], 'val':[1,np.NaN, 0, np.NaN, 1, 0, 1, 0, 0]})
# cummax fills everything after the first True to True in each group
# mask replaces the 0s that were originally nan by nan
df.val = df.val.eq(1).groupby(df.id).cummax().astype(int).mask(lambda x: x.eq(0) & df.val.isna())
df
objective是给组中任何一个比从idxmax()
中检索到的值更高的索引分配1simport numpy as np
import pandas as pd
df = pd.DataFrame({'id':[1, 1, 1, 2, 2, 2, 3, 3, 3], 'val':[1,np.NaN, 0, np.NaN, 1, 0, 1, 0, 0]})
id val
0 1 1.0
1 1 NaN
2 1 0.0
3 2 NaN
4 2 1.0
5 2 0.0
6 3 1.0
7 3 0.0
8 3 0.0
我们可以使用 idxmax() 来获取每组中最大值的索引值
test = df.groupby('id')['val'].idxmax()
id
1 0
2 4
3 6
objective 是将数据转换成这样(即组中索引高于 idxmax() 的值的每个值都被分配为 1。
id val
0 1 1.0
1 1 1.0
2 1 1.0
3 2 NaN
4 2 1.0
5 2 1.0
6 3 1.0
7 3 1.0
8 3 1.0
这道题不一定需要用idxmax()来做。欢迎任何建议。
如果我理解正确的话,你可以使用apply
和np.where
nd = df.groupby('id')['val'].idxmax().tolist()
df['val'] = df.groupby('id')['val'].transform(lambda x: np.where(x.index>nd[x.name-1], 1, x))
df
Output:
id val
0 1 1.0
1 1 1.0
2 1 1.0
3 2 NaN
4 2 1.0
5 2 1.0
6 3 1.0
7 3 1.0
8 3 1.0
尝试
df = pd.DataFrame({'id':[1, 1, 1, 2, 2, 2, 3, 3, 3], 'val':[1,np.NaN, 0, np.NaN, 1, 0, 1, 0, 0]})
# cummax fills everything after the first True to True in each group
# mask replaces the 0s that were originally nan by nan
df.val = df.val.eq(1).groupby(df.id).cummax().astype(int).mask(lambda x: x.eq(0) & df.val.isna())
df