如果超过 90% 的特征在 pandas 中有缺失值，如何删除整条记录

Question

我有一个名为 df 的 pandas 数据框，其中包含 500 列和 200 万条记录。

我能够删除包含 90% 以上缺失值的列。

但是，如果整条记录中 90% 或更多的列有缺失值，我该如何放入 pandas 整条记录？

我已经看到“R”的类似 post，但我目前正在 python 编码。

Answer 1

您可以在 axis=1 上使用 isna + mean 来查找每一行的 NaN 值百分比。然后 select 小于 0.9（即 90%）的行使用 loc:

out = df.loc[df.isna().mean(axis=1)<0.9]

Answer 2

您可以使用 df.dropna() 并将 thresh 参数设置为与 10% 的列对应的值（non-NA 值的最小数量）。

df.dropna(axis=0, thresh=50, inplace=True)

How to drop entire record if more than 90% of features have missing value in pandas