Groupby 应用分位数替换

Groupby Apply Quantile Replacement

我正在尝试使用 python 的 pandas groupby、apply、where 和 quantile 将低于 50% 分位数的值替换为 NaN by 'date' group 但是它似乎在单元格中返回列表。如何在 'value'.

列之后的新列中获得这些结果

这是我的代码(欢迎任何其他方法)。它 returns 在单元格中列出:

In[0]: df.groupby('date')['value'].apply(lambda x: np.where(x<x.quantile(0.5),np.nan,x))  
Out[0]:                            
date                            value     
2019-12-23  [nan, nan, 3.0, 4.0, 5.0]
2014-08-13  [nan, nan, 3.0, 4.0, 5.0]

如果我创建一个新列它 returns NaN in new column:

In[1]: df['new_value']= df.groupby('date')['value'].apply(lambda x: np.where(x<x.quantile(0.5),np.nan,x))
Out[1]: 
        date  value    new_value
0 2019-12-23      1.0       NaN
1 2019-12-23      2.0       NaN
2 2019-12-23      3.0       NaN
3 2019-12-23      4.0       NaN
4 2019-12-23      5.0       NaN
5 2014-08-13      1.0       NaN
6 2014-08-13      2.0       NaN
7 2014-08-13      3.0       NaN
8 2014-08-13      4.0       NaN
9 2014-08-13      5.0       NaN

我想讲这个:

        date     value    new_value
0 2019-12-23      1.0        NaN
1 2019-12-23      2.0        NaN
2 2019-12-23      3.0        3.0
3 2019-12-23      4.0        4.0
4 2019-12-23      5.0        5.0
5 2014-08-13      1.0        NaN
6 2014-08-13      2.0        NaN
7 2014-08-13      3.0        3.0
8 2014-08-13      4.0        4.0
9 2014-08-13      5.0        5.0

您可以使用 transform

而不是 apply
df["new_value"] = df.groupby("date")["value"].transform(
    lambda x: np.where(x < x.quantile(0.5), np.nan, x)
)


    date    value   new_value
0   2019-12-23  1.0     NaN
1   2019-12-23  2.0     NaN
2   2019-12-23  3.0     3.0
3   2019-12-23  4.0     4.0
4   2019-12-23  5.0     5.0
5   2014-08-13  1.0     NaN
6   2014-08-13  2.0     NaN
7   2014-08-13  3.0     3.0
8   2014-08-13  4.0     4.0
9   2014-08-13  5.0     5.0