Pandas: 使用列值 select 来自不同列的值来填充新列
Pandas: Use column value to select the value from a different column to populate a new column
我有这个数据框调用任务:
0_score 1_score 2_score 3_score 4_score 5_score true_label
0 0.007512 0.264500 0.273147 0.218029 0.233726 0.003084 1
1 0.130695 0.289085 0.173402 0.144897 0.238129 0.023792 1
2 0.006896 0.130070 0.289822 0.210133 0.219567 0.143512 4
3 0.006819 0.178320 0.259109 0.041048 0.316587 0.198118 1
4 0.011121 0.058437 0.182823 0.317847 0.123521 0.306250 3
我想根据列 true_label 中的值创建一个新列。我可以这样做:
scores = ['0_score', '1_score', '2_score', '3_score', '4_score','5_score']
(quest.assign(true_label_score = lambda df_:df_[scores[1]]))
这给了我这个:
0_score 1_score 2_score 3_score 4_score 5_score true_label true_label_score
0 0.007512 0.264500 0.273147 0.218029 0.233726 0.003084 1 0.264500
1 0.130695 0.289085 0.173402 0.144897 0.238129 0.023792 1 0.289085
2 0.006896 0.130070 0.289822 0.210133 0.219567 0.143512 4 0.130070
3 0.006819 0.178320 0.259109 0.041048 0.316587 0.198118 1 0.178320
4 0.011121 0.058437 0.182823 0.317847 0.123521 0.306250 3 0.058437
如何将 [scores[1]] 替换为 score[quest.true_label] 之类的内容,以便对于每一行,它都使用 true_label 列中的值来为我提供正确的列来自列表分数,以便 true_label_score 列中的值来自匹配列?索引行 2 应使用 4_scores 列中的值,索引行 4 应使用 3_scores 列中的值作为 true_label_score.
中的值
您可以使用DataFrame.apply
def label_score(row):
col_num = int(row['true_label'])
return row[f'{col_num}_score']
quest['true_label_score'] = quest.apply(label_score, axis=1)
如果您想要基于 scores
列表的解决方案,您可以这样做
scores = ['0_score', '1_score', '2_score', '3_score', '4_score','5_score']
def label_score(row, scores):
col_num = int(row['true_label'])
col_label = scores[col_num]
return row[col_label]
quest['true_label_score'] = quest.apply(label_score, scores=scores, axis=1)
但是,假设列的顺序正确(即 0_score
是第一列,1_score
是第二列,依此类推),
正如@mozway 所建议的那样,使用 numpy
花式索引会更快。
quest['true_label_score'] = quest.to_numpy()[np.arange(len(quest)), quest['true_label']]
输出:
>>> quest
0_score 1_score 2_score 3_score 4_score 5_score true_label true_label_score
0 0.007512 0.264500 0.273147 0.218029 0.233726 0.003084 1 0.264500
1 0.130695 0.289085 0.173402 0.144897 0.238129 0.023792 1 0.289085
2 0.006896 0.130070 0.289822 0.210133 0.219567 0.143512 4 0.219567
3 0.006819 0.178320 0.259109 0.041048 0.316587 0.198118 1 0.178320
4 0.011121 0.058437 0.182823 0.317847 0.123521 0.306250 3 0.317847
我有这个数据框调用任务:
0_score 1_score 2_score 3_score 4_score 5_score true_label
0 0.007512 0.264500 0.273147 0.218029 0.233726 0.003084 1
1 0.130695 0.289085 0.173402 0.144897 0.238129 0.023792 1
2 0.006896 0.130070 0.289822 0.210133 0.219567 0.143512 4
3 0.006819 0.178320 0.259109 0.041048 0.316587 0.198118 1
4 0.011121 0.058437 0.182823 0.317847 0.123521 0.306250 3
我想根据列 true_label 中的值创建一个新列。我可以这样做:
scores = ['0_score', '1_score', '2_score', '3_score', '4_score','5_score']
(quest.assign(true_label_score = lambda df_:df_[scores[1]]))
这给了我这个:
0_score 1_score 2_score 3_score 4_score 5_score true_label true_label_score
0 0.007512 0.264500 0.273147 0.218029 0.233726 0.003084 1 0.264500
1 0.130695 0.289085 0.173402 0.144897 0.238129 0.023792 1 0.289085
2 0.006896 0.130070 0.289822 0.210133 0.219567 0.143512 4 0.130070
3 0.006819 0.178320 0.259109 0.041048 0.316587 0.198118 1 0.178320
4 0.011121 0.058437 0.182823 0.317847 0.123521 0.306250 3 0.058437
如何将 [scores[1]] 替换为 score[quest.true_label] 之类的内容,以便对于每一行,它都使用 true_label 列中的值来为我提供正确的列来自列表分数,以便 true_label_score 列中的值来自匹配列?索引行 2 应使用 4_scores 列中的值,索引行 4 应使用 3_scores 列中的值作为 true_label_score.
中的值您可以使用DataFrame.apply
def label_score(row):
col_num = int(row['true_label'])
return row[f'{col_num}_score']
quest['true_label_score'] = quest.apply(label_score, axis=1)
如果您想要基于 scores
列表的解决方案,您可以这样做
scores = ['0_score', '1_score', '2_score', '3_score', '4_score','5_score']
def label_score(row, scores):
col_num = int(row['true_label'])
col_label = scores[col_num]
return row[col_label]
quest['true_label_score'] = quest.apply(label_score, scores=scores, axis=1)
但是,假设列的顺序正确(即 0_score
是第一列,1_score
是第二列,依此类推),
正如@mozway 所建议的那样,使用 numpy
花式索引会更快。
quest['true_label_score'] = quest.to_numpy()[np.arange(len(quest)), quest['true_label']]
输出:
>>> quest
0_score 1_score 2_score 3_score 4_score 5_score true_label true_label_score
0 0.007512 0.264500 0.273147 0.218029 0.233726 0.003084 1 0.264500
1 0.130695 0.289085 0.173402 0.144897 0.238129 0.023792 1 0.289085
2 0.006896 0.130070 0.289822 0.210133 0.219567 0.143512 4 0.219567
3 0.006819 0.178320 0.259109 0.041048 0.316587 0.198118 1 0.178320
4 0.011121 0.058437 0.182823 0.317847 0.123521 0.306250 3 0.317847