根据条件从数据框创建数组

Create an array from a data frame, based off of conditions

我有一个示例数据框;

df=pd.DataFrame({'degree_awarded':['yes','no','yes','yes',
                                 'yes','yes' ,'yes','no'],
                  'avg_score':[78,87,94,55,68,76,78,8]
                })

degree_awarded avg_score
yes 78
no 87
yes 94
yes 55
etc. etc.

我想将 'degree_awarded' 列分成 'degree_awarded'、'no_degree_awarded' 数组以及相关分数,例如

degree_awarded: [78, 94, 55, etc.]
no_degree_awarded: [87, etc.]

但我不知道该怎么做。

如有任何帮助,我们将不胜感激,感谢您的宝贵时间。

listScoreAwarded=list(df[df['degree_awarded']=='yes']['avg_score'])

listScoreNotAwarded=list(df[df['degree_awarded']=='no']['avg_score'])

这两个列表都应该有效

你可以assign the labels you want, then use groupby.agg(list).

作为系列:

(df
 .assign(group=df['degree_awarded'].map({'yes': 'degree_awarded',
                                         'no': 'no_degree_awarded'}))
 .groupby('group')['avg_score'].agg(list)
)

输出:

group
degree_awarded       [78, 94, 55, 68, 76, 78]
no_degree_awarded                     [87, 8]
Name: avg_score, dtype: object

作为字典:

(df
 .assign(group=df['degree_awarded'].map({'yes': 'degree_awarded',
                                         'no': 'no_degree_awarded'}))
 .groupby('group')['avg_score'].agg(list)
 .to_dict()
)

输出:{'degree_awarded': [78, 94, 55, 68, 76, 78], 'no_degree_awarded': [87, 8]}