Python Pandas 根据多个其他列中的条件替换一列中的值

Python Pandas replace values in one column based on conditional in multiple other columns

使用数据框 df:

Product_ID | Category_A   | Category _B
1232             0              0 
1343             Unknown        X
2543             Nan            0 
2549             Y              Y
0349             X              X
8533             Y              X

我想创建一个新列 Category_Final,规则如下:

预期输出:

Product_ID | Category_A   | Category _B | Category_Final
1232             0              0            Unknown
1343             Unknown        X            Unknown
2543             Nan            0            Unknown
2549             Y              Y            0
0349             X              X            0
8533             Y              X            X

我设法获得了 0 和 X 的逻辑,但我不知道如何包含未知逻辑。

df['Category_Final'] = np.where(df['Category_A'] != df['Category_B'], 'X', '0')

谢谢!

在当前行之后,试试这个:

mask = ((df.Category_A.isnull()) | 
        (df.Category_A == 'Unknown') | 
        (df.Category_A == 0))
df.loc[mask, 'Category_Final'] = 'Unknown'

您可以使用嵌套 np.where

df['Category_Final'] = np.where((df['Category_A'].isnull() | \
                                              (df['Category_A'] == 'Unknown') | (df['Category_A'] == '0')),\
                                              'Unknown', np.where(df['Category_A'] == \
                                                                  df['Category_B'], 0, 'X'))

输出

Product_ID  Category_A  Category_B  Category_Final
0   1232    0            0            Unknown
1   1343    Unknown      X            Unknown
2   2543    NaN          0            Unknown
3   2549    Y            Y              0
4   349     X            X              0
5   8533    Y            X              X
df['Category_Final'] = (
    df.apply(lambda _: "0", axis=1)
    .where(df['Category_A'] == df['Category_B'], "X")
    .where(~df['Category_A'].isin(["0", "Unknown", np.NaN]), "Unknown")
)