Python fillna 使用所选列的行值的平均值

Python fillna using mean of row values for selected columns

在下面的 table 中,我只需要为周栏填写 fillna。 NaN 应填充该行中所有周的平均值。

+----+---------+------+-------+-------+-------+-------+
| ID | Feature | Paid | Week1 | Week2 | Week3 | Week4 |
+----+---------+------+-------+-------+-------+-------+
| 1  | 1       | 1    | 12    | NaN   | NaN   | NaN   |   
+----+---------+------+-------+-------+-------+-------+
| 2  | 0       | 1    | 34    | 23    | NaN   | NaN   |
+----+---------+------+-------+-------+-------+-------+
| 3  | 1       | 0    | 24    | 13    | 14    | NaN   |
+----+---------+------+-------+-------+-------+-------+

代码

df.fillna(df[[Week1,Week2,Week3,Week4]].mean(axis=1),axis=1,inplace=True)

这给出了一个错误 NotImplementedError: Currently only can fill with dict/Series column by column

您可以尝试通过 filter() select 列命名为 'Week' 然后找到平均值并将其存储到变量中(以获得良好的性能)最后填充 NaN's使用 fillna():

cols=df.filter(regex='Week').columns
m=df[cols].mean(axis=1).round()
df=df.fillna({x:m for x in cols})

输出:

    ID  Feature Paid    Week1       Week2   Week3   Week4
0   1       1       1       12      12.0    12.0    12.0
1   2       0       1       34      23.0    28.0    28.0
2   3       1       0       24      13.0    14.0    17.0

尝试以下 -

df = pd.DataFrame({"A": [1, 2, np.NaN, 1, 2], "B": [1,2, 3, 4, 5], "C":[np.NaN, 3, 4, np.NaN, 5]})
cols=["A", "C"]
df[cols] = df[cols].fillna(df[cols].mean())
     A  B    C
0  1.0  1  4.0
1  2.0  2  3.0
2  1.5  3  4.0
3  1.0  4  4.0
4  2.0  5  5.0

创建一个字典,将 Week 名称映射到 axis=1 周的 mean 值,然后使用该字典填充 NaN

c = df.filter(like='Week').columns
df.fillna(dict.fromkeys(c, df[c].mean(1)))

   ID  Feature  Paid  Week1  Week2  Week3  Week4
0   1        1     1     12   12.0   12.0   12.0
1   2        0     1     34   23.0   28.5   28.5
2   3        1     0     24   13.0   14.0   17.0

你可以在这个函数中使用 scikit-learn 中的 SimpleImputerstrategy='mean' 你也有 strategy='most_frequent' 并且使用它很容易。(doc)

import numpy as np
from sklearn.impute import SimpleImputer
df = pd.DataFrame({"A": [1, 2, np.NaN, 1, 2], "B": [1,2, 3, 4, 5], "C":[np.NaN, 3, 4, np.NaN, 5]})

imp = SimpleImputer(missing_values=np.nan, strategy='mean')
imp.fit(df)
print(imp.transform(df))

输出:

[[1.  1.  4. ]
 [2.  2.  3. ]
 [1.5 3.  4. ]
 [1.  4.  4. ]
 [2.  5.  5. ]]

例如 strategy='most_frequent' 你有:

import numpy as np
from sklearn.impute import SimpleImputer
df = pd.DataFrame({"A": [1, 2, np.NaN, 1, 2], "B": [1,2, 3, 4, 5], "C":[np.NaN, 3, 4, np.NaN, 5]})

imp = SimpleImputer(missing_values=np.nan, strategy='most_frequent')
imp.fit(df)
print(imp.transform(df))

输出:

[[1. 1. 3.]
 [2. 2. 3.]
 [1. 3. 4.]
 [1. 4. 3.]
 [2. 5. 5.]]

克服错误的一个简单方法是转置 before/after 填充 Nas:

df.T.fillna(df.filter(like='Week').mean(axis=1)).T.astype(int)

输出:

   ID  Feature  Paid  Week1  Week2  Week3  Week4
0   1        1     1     12     12     12     12
1   2        0     1     34     23     28     28
2   3        1     0     24     13     14     17

类似

cols=['Week1','Week2','Week3','Week4']
df[cols] = df[cols].fillna(0) + df[cols].isna().mul(df[cols].mean(axis=1),axis=0)
df
Out[87]: 
   ID  Feature  Paid  Week1  Week2  Week3  Week4
0   1        1     1   12.0   12.0   12.0   12.0
1   2        0     1   34.0   23.0   28.5   28.5
2   3        1     0   24.0   13.0   14.0   17.0