如何标记满足某些标准的每个组的第一个条目？

Question

假设我有一些数据框，其中一列有一些值多次出现形成组（片段中的列 A）。现在我想创建一个新列，例如每个组的第一个 x（第 C 列）条目为 1，其他条目为 0。我设法完成了第一部分，但我没有找到将条件包含在 xes 中的好方法，有没有好的方法可以做到这一点？

import pandas as pd
df = pd.DataFrame(
    {
        "A": ["0", "0", "1", "2", "2", "2"],  # data to group by
        "B": ["a", "b", "c", "d", "e", "f"],  # some other irrelevant data to be preserved
        "C": ["y", "x", "y", "x", "y", "x"],  # only consider the 'x'
    }
)
target = pd.DataFrame(
    {
        "A": ["0", "0", "1", "2", "2", "2"],  
        "B": ["a", "b", "c", "d", "e", "f"], 
        "C": ["y", "x", "y", "x", "y", "x"],
        "D": [  0,   1,   0,   1,   0,   0]  # first entry per group of 'A' that has an 'C' == 'x'
    }
)
# following partial solution doesn't account for filtering by 'x' in 'C'
df['D'] = df.groupby('A')['C'].transform(lambda x: [1 if i == 0 else 0 for i in range(len(x))])

Answer 1

在你的情况下先切片然后 drop_duplicates 然后分配回来

df['D'] = df.loc[df.C=='x'].drop_duplicates('A').assign(D=1)['D']
df['D'].fillna(0,inplace=True)
df
Out[149]: 
   A  B  C    D
0  0  a  y  0.0
1  0  b  x  1.0
2  1  c  y  0.0
3  2  d  x  1.0
4  2  e  y  0.0
5  2  f  x  0.0

如何标记满足某些标准的每个组的第一个条目？

How to mark first entry per group satisfying some criterion?

python

dataframe

pandas

pandas-groupby