Pandas select 前 3 名并追加另一个 table

Question

嗨，我想为每个人取前 3 个数字，然后并将 reason_comments 附加到其中。如果有平局，我想只拿第一个。

请问在 python 中我该怎么做？

Table 1:
      id    VarA    VarB    VarC    VarD    VarE
        1   5        4       3       2       1
        2   4        6      21       5       5
        3   3        8      6        9       0
        4   7        8      23      44       0

Table 2: 
    reason_code reason_comment
    VarA        A is high
    VarB        B is high
    VarC        C is high
    VarD        D is high
    VarE        E is high


Results:
id  reason 1    reason 2    reason 3
1   A is high   B is high   C is high
2   C is high   B is high   D is high
3   D is high   B is high   C is high
4   D is high   C is high   B is high

Answer 1

可能存在联系，因此有必要将其删除。所以你可以通过 DataFrame.melt, sorting by DataFrame.sort_values and remove duplicated by DataFrame.drop_duplicates.

重塑 DataFrame

df1 = (df.melt('id')
        .sort_values(['id','value'], ascending=[True, False])
        .drop_duplicates(['id','value']))

然后使用过滤器 top3 GroupBy.cumcount for possible reuse it for new column names in DataFrame.pivot:

df1['g'] = df1.groupby('id').cumcount().add(1) 
    
df1 = df1[df1['g'].le(3)]

也使用 Series.map 用于另一个 DataFrame 的数据：

s = df2.set_index('reason_code')['reason_comment']
df1['variable'] = df1['variable'].map(s)

df1 = df1.pivot('id','g','variable').add_prefix('reason')

print (df)

g     reason1    reason2    reason3
id                                 
1   A is high  B is high  C is high
2   C is high  B is high  D is high
3   D is high  B is high  C is high
4   D is high  C is high  B is high

要将 id 转换为列并删除 g，请使用：

df1 = df1.reset_index().rename_axis(None, axis=1)

Pandas select 前 3 名并追加另一个 table

Pandas select top 3 and append from another table

python

relational

pandas