根据多个字符串条件为新列赋值
Assigning values to new column based on multiple string conditions
我有:
| ID | Possible_Size | Actual_Size |
|:------: |:------------------:|:-----------------:|
| 1234 | BIG | BIG |
| 5678 | MEDIUM | BIG |
| 9876 | SMALL | SMALL |
| 1092 | MEDIUM | MEDIUM |
我想创建的内容:
| ID | Possible_Size | Actual_Size | Big |
|:------: |:------------------:|:-----------------:|:---------------:|
| 1234 | BIG | BIG | True Positive |
| 5678 | MEDIUM | BIG | False Negative |
| 9876 | BIG | SMALL | False Positive |
| 1092 | MEDIUM | MEDIUM | |
我尝试过的:
def sizes(row):
if row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['BIG']:
df['Big'] = 'True Positive'
elif row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['MEDIUM', 'SMALL']:
df['Big'] = 'False Negative'
elif row['Actual_Size'] in ['MEDIUM', 'SMALL'] and row['Possible_Size'] in ['BIG']:
df['Big'] = 'False Positive'
else:
df['Big'] = ''
df.apply(sizes, axis=1)
目前我得到一个空白的 'Big' 列
对于这多个 if/elif 语句,您可以使用 np.select
:
choices = ['True Positive','False Negative','False Positive']
conditions = [
((df['Actual_Size'].isin(['BIG']))&(df['Possible_Size'].isin(['BIG']))),
((df['Actual_Size'].isin(['BIG']))&(df['Possible_Size'].isin(['MEDIUM', 'SMALL']))),
((df['Actual_Size'].isin(['MEDIUM', 'SMALL']))&(df['Possible_Size'].isin(['BIG'])))]
import numpy as np
df['Big'] = np.select(conditions, choices, default='')
如果你想保留原来的解决方案,问题是你在逐行应用函数时没有返回任何内容,所以你可以试试这个:
def sizes(row):
if row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['BIG']:
return'True Positive'
elif row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['MEDIUM', 'SMALL']:
return 'False Negative'
elif row['Actual_Size'] in ['MEDIUM', 'SMALL'] and row['Possible_Size'] in ['BIG']:
return 'False Positive'
else:
return ''
df['Big']=df.apply(sizes, axis=1)
两个输出:
df
ID Possible_Size Actual_Size Big
0 1234 BIG BIG True Positive
1 5678 MEDIUM BIG False Negative
2 9876 BIG SMALL False Positive
3 1092 MEDIUM MEDIUM
我有:
| ID | Possible_Size | Actual_Size |
|:------: |:------------------:|:-----------------:|
| 1234 | BIG | BIG |
| 5678 | MEDIUM | BIG |
| 9876 | SMALL | SMALL |
| 1092 | MEDIUM | MEDIUM |
我想创建的内容:
| ID | Possible_Size | Actual_Size | Big |
|:------: |:------------------:|:-----------------:|:---------------:|
| 1234 | BIG | BIG | True Positive |
| 5678 | MEDIUM | BIG | False Negative |
| 9876 | BIG | SMALL | False Positive |
| 1092 | MEDIUM | MEDIUM | |
我尝试过的:
def sizes(row):
if row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['BIG']:
df['Big'] = 'True Positive'
elif row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['MEDIUM', 'SMALL']:
df['Big'] = 'False Negative'
elif row['Actual_Size'] in ['MEDIUM', 'SMALL'] and row['Possible_Size'] in ['BIG']:
df['Big'] = 'False Positive'
else:
df['Big'] = ''
df.apply(sizes, axis=1)
目前我得到一个空白的 'Big' 列
对于这多个 if/elif 语句,您可以使用 np.select
:
choices = ['True Positive','False Negative','False Positive']
conditions = [
((df['Actual_Size'].isin(['BIG']))&(df['Possible_Size'].isin(['BIG']))),
((df['Actual_Size'].isin(['BIG']))&(df['Possible_Size'].isin(['MEDIUM', 'SMALL']))),
((df['Actual_Size'].isin(['MEDIUM', 'SMALL']))&(df['Possible_Size'].isin(['BIG'])))]
import numpy as np
df['Big'] = np.select(conditions, choices, default='')
如果你想保留原来的解决方案,问题是你在逐行应用函数时没有返回任何内容,所以你可以试试这个:
def sizes(row):
if row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['BIG']:
return'True Positive'
elif row['Actual_Size'] in ['BIG'] and row['Possible_Size'] in ['MEDIUM', 'SMALL']:
return 'False Negative'
elif row['Actual_Size'] in ['MEDIUM', 'SMALL'] and row['Possible_Size'] in ['BIG']:
return 'False Positive'
else:
return ''
df['Big']=df.apply(sizes, axis=1)
两个输出:
df
ID Possible_Size Actual_Size Big
0 1234 BIG BIG True Positive
1 5678 MEDIUM BIG False Negative
2 9876 BIG SMALL False Positive
3 1092 MEDIUM MEDIUM