在数据面板中生成列模式

Generating column mode in data panel

我有一个不平衡的数据面板,其中每个时期学生可以获得一定的 level/type 奖学金:

head(df)

ID     student_period         scholarship
   
4567        1              scholarship_level_1
4567        2              scholarship_level_2
4567        3              scholarship_level_2
4567        4              scholarship_level_3
5478        4              scholarship_level_3
5478        5              scholarship_level_3
6758        7              scholarship_level_1
6758        8              scholarship_level_2
6758        9              scholarship_level_2

基本上,我想创建一个新变量来绘制此面板中每个学生 ID 的奖学金级别统计模式。像这样:

head(df1)

ID     student_period         scholarship            scholarship_mode
   
4567        1              scholarship_level_1      scholarship_level_2
4567        2              scholarship_level_2      scholarship_level_2 
4567        3              scholarship_level_2      scholarship_level_2
4567        4              scholarship_level_3      scholarship_level_2
5478        4              scholarship_level_3      scholarship_level_3
5478        5              scholarship_level_3      scholarship_level_3
6758        7              scholarship_level_1      scholarship_level_2
6758        8              scholarship_level_2      scholarship_level_2
6758        9              scholarship_level_2      scholarship_level_2


有什么想法吗?

您可以使用 groupby+transformvalue_counts:

df['scholarship_mode'] = (df.groupby('ID')['scholarship']
                          .transform(lambda x: x.value_counts().index[0]))

mode:

df['scholarship_mode'] = (df.groupby('ID')['scholarship']
                          .transform(lambda x: x.mode().iloc[0]))

输出:

     ID  student_period          scholarship     scholarship_mode
0  4567               1  scholarship_level_1  scholarship_level_2
1  4567               2  scholarship_level_2  scholarship_level_2
2  4567               3  scholarship_level_2  scholarship_level_2
3  4567               4  scholarship_level_3  scholarship_level_2
4  5478               4  scholarship_level_3  scholarship_level_3
5  5478               5  scholarship_level_3  scholarship_level_3
6  6758               7  scholarship_level_1  scholarship_level_2
7  6758               8  scholarship_level_2  scholarship_level_2
8  6758               9  scholarship_level_2  scholarship_level_2

注意。请注意 mode/value_counts 可以有联系,在这种情况下只会使用 one 值。