Pandas select 前 3 名并追加另一个 table
Pandas select top 3 and append from another table
嗨,我想为每个人取前 3 个数字,然后
并将 reason_comments 附加到其中。
如果有平局,我想只拿第一个。
请问在 python 中我该怎么做?
Table 1:
id VarA VarB VarC VarD VarE
1 5 4 3 2 1
2 4 6 21 5 5
3 3 8 6 9 0
4 7 8 23 44 0
Table 2:
reason_code reason_comment
VarA A is high
VarB B is high
VarC C is high
VarD D is high
VarE E is high
Results:
id reason 1 reason 2 reason 3
1 A is high B is high C is high
2 C is high B is high D is high
3 D is high B is high C is high
4 D is high C is high B is high
可能存在联系,因此有必要将其删除。所以你可以通过 DataFrame.melt
, sorting by DataFrame.sort_values
and remove duplicated by DataFrame.drop_duplicates
.
重塑 DataFrame
df1 = (df.melt('id')
.sort_values(['id','value'], ascending=[True, False])
.drop_duplicates(['id','value']))
然后使用过滤器 top3 GroupBy.cumcount
for possible reuse it for new column names in DataFrame.pivot
:
df1['g'] = df1.groupby('id').cumcount().add(1)
df1 = df1[df1['g'].le(3)]
也使用 Series.map
用于另一个 DataFrame 的数据:
s = df2.set_index('reason_code')['reason_comment']
df1['variable'] = df1['variable'].map(s)
df1 = df1.pivot('id','g','variable').add_prefix('reason')
print (df)
g reason1 reason2 reason3
id
1 A is high B is high C is high
2 C is high B is high D is high
3 D is high B is high C is high
4 D is high C is high B is high
要将 id
转换为列并删除 g
,请使用:
df1 = df1.reset_index().rename_axis(None, axis=1)
嗨,我想为每个人取前 3 个数字,然后 并将 reason_comments 附加到其中。 如果有平局,我想只拿第一个。
请问在 python 中我该怎么做?
Table 1:
id VarA VarB VarC VarD VarE
1 5 4 3 2 1
2 4 6 21 5 5
3 3 8 6 9 0
4 7 8 23 44 0
Table 2:
reason_code reason_comment
VarA A is high
VarB B is high
VarC C is high
VarD D is high
VarE E is high
Results:
id reason 1 reason 2 reason 3
1 A is high B is high C is high
2 C is high B is high D is high
3 D is high B is high C is high
4 D is high C is high B is high
可能存在联系,因此有必要将其删除。所以你可以通过 DataFrame.melt
, sorting by DataFrame.sort_values
and remove duplicated by DataFrame.drop_duplicates
.
df1 = (df.melt('id')
.sort_values(['id','value'], ascending=[True, False])
.drop_duplicates(['id','value']))
然后使用过滤器 top3 GroupBy.cumcount
for possible reuse it for new column names in DataFrame.pivot
:
df1['g'] = df1.groupby('id').cumcount().add(1)
df1 = df1[df1['g'].le(3)]
也使用 Series.map
用于另一个 DataFrame 的数据:
s = df2.set_index('reason_code')['reason_comment']
df1['variable'] = df1['variable'].map(s)
df1 = df1.pivot('id','g','variable').add_prefix('reason')
print (df)
g reason1 reason2 reason3
id
1 A is high B is high C is high
2 C is high B is high D is high
3 D is high B is high C is high
4 D is high C is high B is high
要将 id
转换为列并删除 g
,请使用:
df1 = df1.reset_index().rename_axis(None, axis=1)