Python Pandas 根据多个其他列中的条件替换一列中的值
Python Pandas replace values in one column based on conditional in multiple other columns
使用数据框 df:
Product_ID | Category_A | Category _B
1232 0 0
1343 Unknown X
2543 Nan 0
2549 Y Y
0349 X X
8533 Y X
我想创建一个新列 Category_Final,规则如下:
- 如果 Category_A 为 0、未知或 Nan,Category_Final 应为 "Unknown"
- 如果Category_A与Category_B相同,Category_Final应为0
- 如果Category_A不同于Category_B,Category_Final应该是X
预期输出:
Product_ID | Category_A | Category _B | Category_Final
1232 0 0 Unknown
1343 Unknown X Unknown
2543 Nan 0 Unknown
2549 Y Y 0
0349 X X 0
8533 Y X X
我设法获得了 0 和 X 的逻辑,但我不知道如何包含未知逻辑。
df['Category_Final'] = np.where(df['Category_A'] != df['Category_B'], 'X', '0')
谢谢!
在当前行之后,试试这个:
mask = ((df.Category_A.isnull()) |
(df.Category_A == 'Unknown') |
(df.Category_A == 0))
df.loc[mask, 'Category_Final'] = 'Unknown'
您可以使用嵌套 np.where
df['Category_Final'] = np.where((df['Category_A'].isnull() | \
(df['Category_A'] == 'Unknown') | (df['Category_A'] == '0')),\
'Unknown', np.where(df['Category_A'] == \
df['Category_B'], 0, 'X'))
输出
Product_ID Category_A Category_B Category_Final
0 1232 0 0 Unknown
1 1343 Unknown X Unknown
2 2543 NaN 0 Unknown
3 2549 Y Y 0
4 349 X X 0
5 8533 Y X X
df['Category_Final'] = (
df.apply(lambda _: "0", axis=1)
.where(df['Category_A'] == df['Category_B'], "X")
.where(~df['Category_A'].isin(["0", "Unknown", np.NaN]), "Unknown")
)
使用数据框 df:
Product_ID | Category_A | Category _B
1232 0 0
1343 Unknown X
2543 Nan 0
2549 Y Y
0349 X X
8533 Y X
我想创建一个新列 Category_Final,规则如下:
- 如果 Category_A 为 0、未知或 Nan,Category_Final 应为 "Unknown"
- 如果Category_A与Category_B相同,Category_Final应为0
- 如果Category_A不同于Category_B,Category_Final应该是X
预期输出:
Product_ID | Category_A | Category _B | Category_Final
1232 0 0 Unknown
1343 Unknown X Unknown
2543 Nan 0 Unknown
2549 Y Y 0
0349 X X 0
8533 Y X X
我设法获得了 0 和 X 的逻辑,但我不知道如何包含未知逻辑。
df['Category_Final'] = np.where(df['Category_A'] != df['Category_B'], 'X', '0')
谢谢!
在当前行之后,试试这个:
mask = ((df.Category_A.isnull()) |
(df.Category_A == 'Unknown') |
(df.Category_A == 0))
df.loc[mask, 'Category_Final'] = 'Unknown'
您可以使用嵌套 np.where
df['Category_Final'] = np.where((df['Category_A'].isnull() | \
(df['Category_A'] == 'Unknown') | (df['Category_A'] == '0')),\
'Unknown', np.where(df['Category_A'] == \
df['Category_B'], 0, 'X'))
输出
Product_ID Category_A Category_B Category_Final
0 1232 0 0 Unknown
1 1343 Unknown X Unknown
2 2543 NaN 0 Unknown
3 2549 Y Y 0
4 349 X X 0
5 8533 Y X X
df['Category_Final'] = (
df.apply(lambda _: "0", axis=1)
.where(df['Category_A'] == df['Category_B'], "X")
.where(~df['Category_A'].isin(["0", "Unknown", np.NaN]), "Unknown")
)