查找列中具有最大值的 3 行的最有效方法?
Most efficient way for finding 3 rows with maximum value in column?
假设有一个数据框df
Name Balance
A 1000
B 5000
C 3000
D 6000
E 2000
F 5000
我正在寻找一种方法,通过它我可以获得所有余额中最高的三行。
df['balance'].get_indices_max(n=3) # where is no. of results required
这些索引将用于获取行时的输出:
D 6000
F 5000
B 5000
更新:关于已接受答案的额外说明
可能的“保留”值 -
first : prioritize the first occurrence(s)
last : prioritize the last occurrence(s)
all : do not drop any duplicates, even it means selecting more than n items.
回答
df = Df({"Name":list("ABCDEF"), "Balance":[1000,5000,3000,6000,2000,5000]})
index = df["Balance"].nlargest(3).index
df.loc[index]
输出
Name Balance
3 D 6000
1 B 5000
5 F 5000
关注
- 高效
The columns that are not specified are returned as well, but not used for ordering.
This method is equivalent to df.sort_values(columns, ascending=False).head(n), but more performant.
nlargest(3, keep='all')
keep{‘first’, ‘last’, ‘all’}, default ‘first’
When using keep='all', all duplicate items are maintained
例子
df = Df({"Name":list("ABCDEFX"), "Balance":[1000,5000,3000,6000,2000,5000,5000]})
index = df["Balance"].nlargest(3, keep='all').index
df.loc[index]
Name Balance
3 D 6000
1 B 5000
5 F 5000
6 X 5000
参考
我经常
out = df.sort_values('Balance').iloc[3:]
Out[476]:
Name Balance
1 B 5000
5 F 5000
3 D 6000
假设有一个数据框df
Name Balance
A 1000
B 5000
C 3000
D 6000
E 2000
F 5000
我正在寻找一种方法,通过它我可以获得所有余额中最高的三行。
df['balance'].get_indices_max(n=3) # where is no. of results required
这些索引将用于获取行时的输出:
D 6000
F 5000
B 5000
更新:关于已接受答案的额外说明
可能的“保留”值 -
first : prioritize the first occurrence(s)
last : prioritize the last occurrence(s)
all : do not drop any duplicates, even it means selecting more than n items.
回答
df = Df({"Name":list("ABCDEF"), "Balance":[1000,5000,3000,6000,2000,5000]})
index = df["Balance"].nlargest(3).index
df.loc[index]
输出
Name Balance
3 D 6000
1 B 5000
5 F 5000
关注
- 高效
The columns that are not specified are returned as well, but not used for ordering. This method is equivalent to df.sort_values(columns, ascending=False).head(n), but more performant.
nlargest(3, keep='all')
keep{‘first’, ‘last’, ‘all’}, default ‘first’
When using keep='all', all duplicate items are maintained
例子
df = Df({"Name":list("ABCDEFX"), "Balance":[1000,5000,3000,6000,2000,5000,5000]})
index = df["Balance"].nlargest(3, keep='all').index
df.loc[index]
Name Balance
3 D 6000
1 B 5000
5 F 5000
6 X 5000
参考
我经常
out = df.sort_values('Balance').iloc[3:]
Out[476]:
Name Balance
1 B 5000
5 F 5000
3 D 6000