Groupby 应用分位数替换
Groupby Apply Quantile Replacement
我正在尝试使用 python 的 pandas groupby、apply、where 和 quantile 将低于 50% 分位数的值替换为 NaN by 'date' group 但是它似乎在单元格中返回列表。如何在 'value'.
列之后的新列中获得这些结果
这是我的代码(欢迎任何其他方法)。它 returns 在单元格中列出:
In[0]: df.groupby('date')['value'].apply(lambda x: np.where(x<x.quantile(0.5),np.nan,x))
Out[0]:
date value
2019-12-23 [nan, nan, 3.0, 4.0, 5.0]
2014-08-13 [nan, nan, 3.0, 4.0, 5.0]
如果我创建一个新列它 returns NaN in new column:
In[1]: df['new_value']= df.groupby('date')['value'].apply(lambda x: np.where(x<x.quantile(0.5),np.nan,x))
Out[1]:
date value new_value
0 2019-12-23 1.0 NaN
1 2019-12-23 2.0 NaN
2 2019-12-23 3.0 NaN
3 2019-12-23 4.0 NaN
4 2019-12-23 5.0 NaN
5 2014-08-13 1.0 NaN
6 2014-08-13 2.0 NaN
7 2014-08-13 3.0 NaN
8 2014-08-13 4.0 NaN
9 2014-08-13 5.0 NaN
我想讲这个:
date value new_value
0 2019-12-23 1.0 NaN
1 2019-12-23 2.0 NaN
2 2019-12-23 3.0 3.0
3 2019-12-23 4.0 4.0
4 2019-12-23 5.0 5.0
5 2014-08-13 1.0 NaN
6 2014-08-13 2.0 NaN
7 2014-08-13 3.0 3.0
8 2014-08-13 4.0 4.0
9 2014-08-13 5.0 5.0
您可以使用 transform
而不是 apply
df["new_value"] = df.groupby("date")["value"].transform(
lambda x: np.where(x < x.quantile(0.5), np.nan, x)
)
date value new_value
0 2019-12-23 1.0 NaN
1 2019-12-23 2.0 NaN
2 2019-12-23 3.0 3.0
3 2019-12-23 4.0 4.0
4 2019-12-23 5.0 5.0
5 2014-08-13 1.0 NaN
6 2014-08-13 2.0 NaN
7 2014-08-13 3.0 3.0
8 2014-08-13 4.0 4.0
9 2014-08-13 5.0 5.0
我正在尝试使用 python 的 pandas groupby、apply、where 和 quantile 将低于 50% 分位数的值替换为 NaN by 'date' group 但是它似乎在单元格中返回列表。如何在 'value'.
列之后的新列中获得这些结果这是我的代码(欢迎任何其他方法)。它 returns 在单元格中列出:
In[0]: df.groupby('date')['value'].apply(lambda x: np.where(x<x.quantile(0.5),np.nan,x))
Out[0]:
date value
2019-12-23 [nan, nan, 3.0, 4.0, 5.0]
2014-08-13 [nan, nan, 3.0, 4.0, 5.0]
如果我创建一个新列它 returns NaN in new column:
In[1]: df['new_value']= df.groupby('date')['value'].apply(lambda x: np.where(x<x.quantile(0.5),np.nan,x))
Out[1]:
date value new_value
0 2019-12-23 1.0 NaN
1 2019-12-23 2.0 NaN
2 2019-12-23 3.0 NaN
3 2019-12-23 4.0 NaN
4 2019-12-23 5.0 NaN
5 2014-08-13 1.0 NaN
6 2014-08-13 2.0 NaN
7 2014-08-13 3.0 NaN
8 2014-08-13 4.0 NaN
9 2014-08-13 5.0 NaN
我想讲这个:
date value new_value
0 2019-12-23 1.0 NaN
1 2019-12-23 2.0 NaN
2 2019-12-23 3.0 3.0
3 2019-12-23 4.0 4.0
4 2019-12-23 5.0 5.0
5 2014-08-13 1.0 NaN
6 2014-08-13 2.0 NaN
7 2014-08-13 3.0 3.0
8 2014-08-13 4.0 4.0
9 2014-08-13 5.0 5.0
您可以使用 transform
apply
df["new_value"] = df.groupby("date")["value"].transform(
lambda x: np.where(x < x.quantile(0.5), np.nan, x)
)
date value new_value
0 2019-12-23 1.0 NaN
1 2019-12-23 2.0 NaN
2 2019-12-23 3.0 3.0
3 2019-12-23 4.0 4.0
4 2019-12-23 5.0 5.0
5 2014-08-13 1.0 NaN
6 2014-08-13 2.0 NaN
7 2014-08-13 3.0 3.0
8 2014-08-13 4.0 4.0
9 2014-08-13 5.0 5.0