Pandas: groupby 然后检索 IQR

Question

我是 Pandas 的新手，我正在尝试做以下事情：

我有两个数据框 comms 和 arts 看起来像这样（除了它们比其他列更长）

通讯：

ID    commScore           
10       5                
10       3                  
10      -1                 
11       0                
11       2              
12       9      
13      -2     
13      -1     
13       1      
13       4

艺术：

ID    commNumber
10        3 
11        2    
12        1
13        4

我需要按 ID 对 comms 进行分组，然后将四分位距 (IQR 保存在 arts 中（显然在正确的 ID 行中） ) 每个 ID 的 commScore 分布。

我已经尝试过使用groupby、agg和map ，但是由于我对pandas的概念非常有限，所以我做不到我在找。

有人有解决办法吗？

谢谢

安德里亚

Answer 1

GroupBy 对象有一个 quantile 方法。您可以计算 Q3 和 Q1，然后将它们相减。一些重命名和加入如下：

grouper = comms.groupby("ID")
q1, q3 = grouper.quantile(0.25), grouper.quantile(0.75)
iqr = q3 - q1
iqr = iqr.rename(columns={"commScore": "IQR"})

arts = arts.set_index("ID").join(iqr)

得到

>>> arts

    commNumber  IQR
ID
10           3  3.0
11           2  1.0
12           1  0.0
13           4  3.0

如果有多个数字列，那么我们将 commScore 明确为：

grouper = comms.groupby("ID").commScore
q1, q3 = grouper.quantile(0.25), grouper.quantile(0.75)
iqr = q3 - q1
iqr.name = "IQR"  # `iqr` will be a series since we selected 1 column,
                  #  so renaming is a bit different

arts = arts.set_index("ID").join(iqr)

结果相同。

如果你不想调用quantile 2次，你可以传递一个列表[0.75, 0.25]然后用agg减去它们。因此，我们写

而不是上面涉及 q1 和 q3 的 2 行

iqr = grouper.quantile([0.75, 0.25]).groupby("ID").agg(np.subtract.reduce)

其他同理

Answer 2

我们可以通过 ID group 数据框，并使用 scipy.stats 中的函数 iqr 聚合列 commScore 来计算四分位数间距，然后 map arts 数据框 ID 列的计算 iqr 范围

from scipy.stats import iqr

arts['IQR'] = arts['ID'].map(comms.groupby('ID')['commScore'].agg(iqr))

   ID  commNumber  IQR
0  10           3    3
1  11           2    1
2  12           1    0
3  13           4    3

Pandas: groupby 然后检索 IQR

Pandas: groupby and then retrieving IQR

python

quantile

dataframe

pandas

pandas-groupby