查找列中具有最大值的 3 行的最有效方法？

Question

假设有一个数据框df

Name  Balance
A     1000
B     5000
C     3000
D     6000
E     2000
F     5000

我正在寻找一种方法，通过它我可以获得所有余额中最高的三行。

df['balance'].get_indices_max(n=3) # where is no. of results required

这些索引将用于获取行时的输出：

D 6000
F 5000
B 5000

更新：关于已接受答案的额外说明

可能的“保留”值 -

first : prioritize the first occurrence(s)

last : prioritize the last occurrence(s)

all : do not drop any duplicates, even it means selecting more than n items.

Answer 1

回答

df = Df({"Name":list("ABCDEF"), "Balance":[1000,5000,3000,6000,2000,5000]})
index = df["Balance"].nlargest(3).index
df.loc[index]

输出

  Name  Balance
3    D     6000
1    B     5000
5    F     5000

关注

高效

The columns that are not specified are returned as well, but not used for ordering. This method is equivalent to df.sort_values(columns, ascending=False).head(n), but more performant.

nlargest(3, keep='all')

keep{‘first’, ‘last’, ‘all’}, default ‘first’

When using keep='all', all duplicate items are maintained

例子

df = Df({"Name":list("ABCDEFX"), "Balance":[1000,5000,3000,6000,2000,5000,5000]})
index = df["Balance"].nlargest(3, keep='all').index
df.loc[index]

  Name  Balance
3    D     6000
1    B     5000
5    F     5000
6    X     5000

参考

DataFrame.nlargest

Answer 2

我经常

out = df.sort_values('Balance').iloc[3:]
Out[476]: 
  Name  Balance
1    B     5000
5    F     5000
3    D     6000

查找列中具有最大值的 3 行的最有效方法？

Most efficient way for finding 3 rows with maximum value in column?

python

performance

processing-efficiency

dataframe

pandas

回答

输出

关注

参考