根据多个字符串条件为新列赋值

Assigning values to new column based on multiple string conditions

我有:

|    ID   |   Possible_Size    |   Actual_Size     |
|:------: |:------------------:|:-----------------:|  
|   1234  |         BIG        |        BIG        |
|   5678  |       MEDIUM       |        BIG        |
|   9876  |        SMALL       |       SMALL       |       
|   1092  |       MEDIUM       |       MEDIUM      |

我想创建的内容:

|    ID   |   Possible_Size    |   Actual_Size     |       Big       |
|:------: |:------------------:|:-----------------:|:---------------:|  
|   1234  |         BIG        |        BIG        |  True Positive  |
|   5678  |       MEDIUM       |        BIG        |  False Negative |  
|   9876  |        BIG         |       SMALL       |  False Positive |   
|   1092  |       MEDIUM       |       MEDIUM      |                 |

我尝试过的:

    def sizes(row):
                        
        if row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['BIG']:
            df['Big'] = 'True Positive'
        elif row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['MEDIUM', 'SMALL']:
            df['Big'] = 'False Negative'
        elif row['Actual_Size'] in ['MEDIUM', 'SMALL'] and row['Possible_Size'] in ['BIG']:
            df['Big'] = 'False Positive'  
        else:
            df['Big'] = ''
                        
    df.apply(sizes, axis=1)

目前我得到一个空白的 'Big' 列

对于这多个 if/elif 语句,您可以使用 np.select:

choices = ['True Positive','False Negative','False Positive']
conditions = [
       ((df['Actual_Size'].isin(['BIG']))&(df['Possible_Size'].isin(['BIG']))), 
       ((df['Actual_Size'].isin(['BIG']))&(df['Possible_Size'].isin(['MEDIUM', 'SMALL']))),
       ((df['Actual_Size'].isin(['MEDIUM', 'SMALL']))&(df['Possible_Size'].isin(['BIG'])))]
import numpy as np
df['Big'] = np.select(conditions, choices, default='')

如果你想保留原来的解决方案,问题是你在逐行应用函数时没有返回任何内容,所以你可以试试这个:

def sizes(row):

    if row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['BIG']:
        return'True Positive'
    elif row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['MEDIUM', 'SMALL']:
        return 'False Negative'
    elif row['Actual_Size'] in ['MEDIUM', 'SMALL'] and row['Possible_Size'] in ['BIG']:
        return 'False Positive'  
    else:
        return ''

df['Big']=df.apply(sizes, axis=1)

两个输出:

df
     ID Possible_Size Actual_Size             Big
0  1234           BIG         BIG   True Positive
1  5678        MEDIUM         BIG  False Negative
2  9876           BIG       SMALL  False Positive
3  1092        MEDIUM      MEDIUM