在 pandas 另一列列值不在列表中的地方填写 na

Fill na in pandas where another column column value is not in list

我有以下数据框:

import pandas as pd
import numpy as np
df = pd.DataFrame({ 
     'Name': ['A','B','A','B','A','B','A','B'],
    'Include':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
    'Category':['Cat','Dog','Car','Dog','Bike','Dog','Cat','Bike'],
    })

df

如果 Category 列与以下列表不匹配,我将尝试用字符串 yes 填充 Include 列:

exluded = ['Car','Bike']

所以我的预期输出是这样的:

关于如何实现这个的任何想法?谢谢!

试试这个

df = pd.DataFrame({ 
     'Name': ['A','B','A','B','A','B','A','B'],
    'Include':[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
    'Category':['Cat','Dog','Car','Dog','Bike','Dog','Cat','Bike'],
    })

exluded = ['Car','Bike']

# check the condition and fill if it fails
df.Include = df.Include.where(df.Category.isin(exluded), 'yes')
df

使用loc和一个布尔掩码:

df.loc[~df['Category'].isin(exluded), 'Include'] = 'yes'
print(df)

# Output
  Name Include Category
0    A     yes      Cat
1    B     yes      Dog
2    A     NaN      Car
3    B     yes      Dog
4    A     NaN     Bike
5    B     yes      Dog
6    A     yes      Cat
7    B     NaN     Bike

替代 np.where

df['Include'] = np.where(df['Category'].isin(exluded), np.nan, 'yes')