用 pandas 替换混合数据框中的异常值

Question

我有一个包含 str、int 和 float 类型的混合数据框。我在浮动列中有一些异常值，并尝试使用

将它们替换为 NaN

df.mask(df.sub(df.mean()).div(df.std()).abs().gt(2))

我也尝试过使用 numpy 的

v = df.values
mask = np.abs((v - v.mean(0)) / v.std(0)) > 2
pd.DataFrame(np.where(mask, np.nan, v), df.index, df.columns)`

但是我得到的是 TypeError: unsupported operand type(s) for -: 'str' and 'float' 和 TypeError: must be str, not float

我也试过仅将此应用于具有异常值的列，但它没有修改任何内容

这就是 df 的样子

    dateRep     cases   deaths  countriesAndTerritories     countryterritoryCode    popData2018 
0   03/05/2020  134.0   4.0     Afghanistan     AFG     37172386.0
1   02/05/2020  164.0   4.0     Afghanistan     AFG     37172386.0
2   01/05/2020  222.0   NaN     Afghanistan     AFG     37172386.0
3   30/04/2020  122.0   0.0     Afghanistan     AFG     37172386.0
4   29/04/2020  124.0   3.0     Afghanistan     AFG     37172386.0

Answer 1

您可以尝试这样的操作（这是为了更改 "cases" 列）：

df.loc[abs(df.cases - df.cases.mean())/df.cases.std() > 1, "cases"] = None

但是请注意，这里我为 "Cases" 列使用了 1 的 Z 值，因为最大的 Z 值是 1.63（索引 = 2 的实例）。您正在尝试修改 Z 值大于 2 的值，none 这些实例的 Z 值大于 2。

希望对您有所帮助！

用 pandas 替换混合数据框中的异常值

Replace outliers in a mixed dataframe with pandas

python

numpy

outliers

dataframe

pandas