用 NaN 中位数替换异常值
Replace outliers with median exept NaN
我想用数据框中的中值替换异常值,但只有异常值而不是 NaN。
第一个:
January February
0 -5.0 -7.0
1 -6.0 -6.0
2 -5.0 -5.0
3 -3.0 -6.0
4 -6.0 -8.0
5 -11.0 -9.0
6 -6.0 5.0
7 -8.0 -11.0
8 -11.0 -12.0
9 -8.0 -9.0
10 -8.0 -6.0
11 -8.0 -5.0
12 -8.0 -4.0
13 -10.0 1.0
14 -10.0 3.0
15 -9.0 -9.0
16 -6.0 -6.0
17 -6.0 -6.0
18 -4.0 -4.0
19 -8.0 2.0
20 -9.0 3.0
21 -14.0 1.0
22 -15.0 -3.0
23 -17.0 -4.0
24 -19.0 -6.0
25 -60.0 -8.0
26 -8.0 -8.0
27 -9.0 -11.0
28 -5.0 NaN
29 -6.0 NaN
30 -7.0 NaN
我想用中值替换异常值 -60:
df = df[df.apply(lambda x: np.abs(x - x.mean()) / x.std() < 4).all(axis=1)]
它工作正常,但它也会删除所有包含 NaN 的行,我该如何避免这种情况?
输出:
January February
0 -5.0 -7.0
1 -6.0 -6.0
2 -5.0 -5.0
3 -3.0 -6.0
4 -6.0 -8.0
5 -11.0 -9.0
6 -6.0 5.0
7 -8.0 -11.0
8 -11.0 -12.0
9 -8.0 -9.0
10 -8.0 -6.0
11 -8.0 -5.0
12 -8.0 -4.0
13 -10.0 1.0
14 -10.0 3.0
15 -9.0 -9.0
16 -6.0 -6.0
17 -6.0 -6.0
18 -4.0 -4.0
19 -8.0 2.0
20 -9.0 3.0
21 -14.0 1.0
22 -15.0 -3.0
23 -17.0 -4.0
24 -19.0 -6.0
25 -10.0 -8.0
26 -8.0 -8.0
27 -9.0 -11.0
如您所见,删除了3行,不太方便。有任何想法吗 ?谢谢!
您可以在您的逻辑中使用 .isna()
:
df = df[df.apply(lambda x: (np.abs(x - x.mean()) / x.std() < 4) | x.isna()).all(axis=1)]
print(df)
缺少打印件(通知索引 25 (-60.0
):
January February
0 -5.0 -7.0
1 -6.0 -6.0
2 -5.0 -5.0
3 -3.0 -6.0
4 -6.0 -8.0
5 -11.0 -9.0
6 -6.0 5.0
7 -8.0 -11.0
8 -11.0 -12.0
9 -8.0 -9.0
10 -8.0 -6.0
11 -8.0 -5.0
12 -8.0 -4.0
13 -10.0 1.0
14 -10.0 3.0
15 -9.0 -9.0
16 -6.0 -6.0
17 -6.0 -6.0
18 -4.0 -4.0
19 -8.0 2.0
20 -9.0 3.0
21 -14.0 1.0
22 -15.0 -3.0
23 -17.0 -4.0
24 -19.0 -6.0
26 -8.0 -8.0
27 -9.0 -11.0
28 -5.0 NaN
29 -6.0 NaN
30 -7.0 NaN
使用numpy.where(...)
:
df[["January", "February"]]=\
np.where(
df.sub(df.mean(axis=0)).abs()\
.div(df.std(axis=0))>=4,
df.median(axis=0), df
)
输出:
January February
0 -5.0 -7.0
1 -6.0 -6.0
2 -5.0 -5.0
3 -3.0 -6.0
4 -6.0 -8.0
5 -11.0 -9.0
6 -6.0 5.0
7 -8.0 -11.0
8 -11.0 -12.0
9 -8.0 -9.0
10 -8.0 -6.0
11 -8.0 -5.0
12 -8.0 -4.0
13 -10.0 1.0
14 -10.0 3.0
15 -9.0 -9.0
16 -6.0 -6.0
17 -6.0 -6.0
18 -4.0 -4.0
19 -8.0 2.0
20 -9.0 3.0
21 -14.0 1.0
22 -15.0 -3.0
23 -17.0 -4.0
24 -19.0 -6.0
25 -8.0 -8.0
26 -8.0 -8.0
27 -9.0 -11.0
28 -5.0 NaN
29 -6.0 NaN
30 -7.0 NaN
我想用数据框中的中值替换异常值,但只有异常值而不是 NaN。
第一个:
January February
0 -5.0 -7.0
1 -6.0 -6.0
2 -5.0 -5.0
3 -3.0 -6.0
4 -6.0 -8.0
5 -11.0 -9.0
6 -6.0 5.0
7 -8.0 -11.0
8 -11.0 -12.0
9 -8.0 -9.0
10 -8.0 -6.0
11 -8.0 -5.0
12 -8.0 -4.0
13 -10.0 1.0
14 -10.0 3.0
15 -9.0 -9.0
16 -6.0 -6.0
17 -6.0 -6.0
18 -4.0 -4.0
19 -8.0 2.0
20 -9.0 3.0
21 -14.0 1.0
22 -15.0 -3.0
23 -17.0 -4.0
24 -19.0 -6.0
25 -60.0 -8.0
26 -8.0 -8.0
27 -9.0 -11.0
28 -5.0 NaN
29 -6.0 NaN
30 -7.0 NaN
我想用中值替换异常值 -60:
df = df[df.apply(lambda x: np.abs(x - x.mean()) / x.std() < 4).all(axis=1)]
它工作正常,但它也会删除所有包含 NaN 的行,我该如何避免这种情况?
输出:
January February
0 -5.0 -7.0
1 -6.0 -6.0
2 -5.0 -5.0
3 -3.0 -6.0
4 -6.0 -8.0
5 -11.0 -9.0
6 -6.0 5.0
7 -8.0 -11.0
8 -11.0 -12.0
9 -8.0 -9.0
10 -8.0 -6.0
11 -8.0 -5.0
12 -8.0 -4.0
13 -10.0 1.0
14 -10.0 3.0
15 -9.0 -9.0
16 -6.0 -6.0
17 -6.0 -6.0
18 -4.0 -4.0
19 -8.0 2.0
20 -9.0 3.0
21 -14.0 1.0
22 -15.0 -3.0
23 -17.0 -4.0
24 -19.0 -6.0
25 -10.0 -8.0
26 -8.0 -8.0
27 -9.0 -11.0
如您所见,删除了3行,不太方便。有任何想法吗 ?谢谢!
您可以在您的逻辑中使用 .isna()
:
df = df[df.apply(lambda x: (np.abs(x - x.mean()) / x.std() < 4) | x.isna()).all(axis=1)]
print(df)
缺少打印件(通知索引 25 (-60.0
):
January February
0 -5.0 -7.0
1 -6.0 -6.0
2 -5.0 -5.0
3 -3.0 -6.0
4 -6.0 -8.0
5 -11.0 -9.0
6 -6.0 5.0
7 -8.0 -11.0
8 -11.0 -12.0
9 -8.0 -9.0
10 -8.0 -6.0
11 -8.0 -5.0
12 -8.0 -4.0
13 -10.0 1.0
14 -10.0 3.0
15 -9.0 -9.0
16 -6.0 -6.0
17 -6.0 -6.0
18 -4.0 -4.0
19 -8.0 2.0
20 -9.0 3.0
21 -14.0 1.0
22 -15.0 -3.0
23 -17.0 -4.0
24 -19.0 -6.0
26 -8.0 -8.0
27 -9.0 -11.0
28 -5.0 NaN
29 -6.0 NaN
30 -7.0 NaN
使用numpy.where(...)
:
df[["January", "February"]]=\
np.where(
df.sub(df.mean(axis=0)).abs()\
.div(df.std(axis=0))>=4,
df.median(axis=0), df
)
输出:
January February
0 -5.0 -7.0
1 -6.0 -6.0
2 -5.0 -5.0
3 -3.0 -6.0
4 -6.0 -8.0
5 -11.0 -9.0
6 -6.0 5.0
7 -8.0 -11.0
8 -11.0 -12.0
9 -8.0 -9.0
10 -8.0 -6.0
11 -8.0 -5.0
12 -8.0 -4.0
13 -10.0 1.0
14 -10.0 3.0
15 -9.0 -9.0
16 -6.0 -6.0
17 -6.0 -6.0
18 -4.0 -4.0
19 -8.0 2.0
20 -9.0 3.0
21 -14.0 1.0
22 -15.0 -3.0
23 -17.0 -4.0
24 -19.0 -6.0
25 -8.0 -8.0
26 -8.0 -8.0
27 -9.0 -11.0
28 -5.0 NaN
29 -6.0 NaN
30 -7.0 NaN