pandas dataframe：在固定其他列的列中提取具有特定 crtieria/conditions 最小值的数据

Question

首先非常感谢您的帮助。我有一个 table，我用 pandas 作为 df 导入。对于每个唯一的 foo&bar，我想获得一个新的 df，其中包含最小的 zoo 和相应的 qux。我试图简化我的数据框，但实际上我有数百个 qux 和 foo 以及几十个 bar。

我的输入table：

foo	bar	zoo	qux
aaa	HB1	9.75	lab1
aaa	HB1	4.87	lab2
aaa	HB1	3.05	lab3
aaa	TS3	8.51	lab1
aaa	TS3	2.58	lab2
aaa	TS3	2.48	lab3
bbb	HB1	9.03	lab1
bbb	HB1	6.11	lab2
bbb	HB1	7.66	lab3
bbb	TS3	3.57	lab1
bbb	TS3	4.25	lab2
bbb	TS3	1.63	lab3

我的预期结果

foo	bar	zoo	qux
aaa	HB1	3.05	lab3
aaa	TS3	2.48	lab3
bbb	HB1	6.11	lab2
bbb	TS3	1.63	lab3

我尝试使用 groupby 或 pivot_table，我获得了每个 bar 和每个 foo 的 min zoo 但我没有获得相应的 qux 并且 df 完全重塑并且看起来不像我的第一种格式。我有点迷路了。

非常感谢您的帮助。

Answer 1

您可以按 descending 顺序对数据框中的值进行排序并使用 groupby.tail(1):

df.sort_values(by=['foo','bar','zoo','qux'],ascending=False).groupby(['foo','bar']).tail(1)

    foo  bar   zoo   qux
2   aaa  HB1  3.05  lab3
5   aaa  TS3  2.48  lab3
8   bbb  HB1  7.66  lab3
11  bbb  TS3  1.63  lab3

Answer 2

通过groupby获取最小值的索引位置，索引原始df得到行数：

df.loc[df.groupby(['foo', 'bar']).zoo.idxmin()]
 
    foo  bar   zoo   qux
2   aaa  HB1  3.05  lab3
5   aaa  TS3  2.48  lab3
7   bbb  HB1  6.11  lab2
11  bbb  TS3  1.63  lab3

pandas dataframe：在固定其他列的列中提取具有特定 crtieria/conditions 最小值的数据

pandas dataframe: extract data with specific crtieria/conditions minimum in a column fixing other columns

pivot-table

heatmap

pandas

seaborn

pandas-groupby