Pandas groupby 删除前 5% 和后 5% 的数据

Question

我有一个数据框如下：

Month   Col2
A       4
A       5
A       6
A       7
A       8
B       14
B       15
B       16
B       17
B       18
B       19
B       20
B       21
B       22
B       23

我想得到以下信息：

Month   Col2
A       5
A       6
A       7
B       16
B       17
B       18
B       19
B       20
B       21

在上面的 A 组中，前 1 名和后 1 名被删除，因为它们各占 A 组 (5) 总数的 5%。

在上面的 B 组中，前 2 名和后 2 名被删除，因为他们各占 B 组 (10) 总数的 5%。

我不确定如何实现上述目标。

Answer 1

我想你的意思可能是你要删除每个组的顶部和底部 10%。

df = pd.DataFrame({'Month': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A', 5: 'B', 6: 'B', 7: 'B', 8: 'B', 9: 'B', 10: 'B', 11: 'B', 12: 'B', 13: 'B', 14: 'B'}, 'Col2': {0: 4, 1: 5, 2: 6, 3: 7, 4: 8, 5: 14, 6: 15, 7: 16, 8: 17, 9: 18, 10: 19, 11: 20, 12: 21, 13: 22, 14: 23}})
pct = .1
for i, g in df.groupby('Month'):
    count = g.size
    drop = int(pct*count)
    # not necessary but helpful to see what's happening, if desired
    print(f'dropping top and bottom {pct:0.0%} of {count} obs. for group {i} ({count} obs)')
    df.drop(g['Col2'].nlargest(drop).index, inplace=True)
    df.drop(g['Col2'].nsmallest(drop).index, inplace=True)

屈服

   Month  Col2
1      A     5
2      A     6
3      A     7
7      B    16
8      B    17
9      B    18
10     B    19
11     B    20
12     B    21

Answer 2

与 GroupBy.apply:

def crop(gr):
    gr_len = len(gr)
    amt = gr_len // 5
    return gr.iloc[amt: -amt]

df.groupby("Month", group_keys=False, sort=False).apply(crop)

其中crop函数求裁剪量为组总长度的1/5，从头到尾用iloc切片，

得到

   Month  Col2
1      A     5
2      A     6
3      A     7
7      B    16
8      B    17
9      B    18
10     B    19
11     B    20
12     B    21

(group_keys是False去掉结果中石斑鱼Month的多余索引；sort是False保留原来的顺序石斑鱼柱的出现。）

Pandas groupby 删除前 5% 和后 5% 的数据

Pandas groupby drop top 5% and bottom 5% of data

pandas

python-3.8