Python fillna 使用所选列的行值的平均值
Python fillna using mean of row values for selected columns
在下面的 table 中,我只需要为周栏填写 fillna。 NaN 应填充该行中所有周的平均值。
+----+---------+------+-------+-------+-------+-------+
| ID | Feature | Paid | Week1 | Week2 | Week3 | Week4 |
+----+---------+------+-------+-------+-------+-------+
| 1 | 1 | 1 | 12 | NaN | NaN | NaN |
+----+---------+------+-------+-------+-------+-------+
| 2 | 0 | 1 | 34 | 23 | NaN | NaN |
+----+---------+------+-------+-------+-------+-------+
| 3 | 1 | 0 | 24 | 13 | 14 | NaN |
+----+---------+------+-------+-------+-------+-------+
代码
df.fillna(df[[Week1,Week2,Week3,Week4]].mean(axis=1),axis=1,inplace=True)
这给出了一个错误 NotImplementedError: Currently only can fill with dict/Series column by column
您可以尝试通过 filter()
select 列命名为 'Week' 然后找到平均值并将其存储到变量中(以获得良好的性能)最后填充 NaN's
使用 fillna()
:
cols=df.filter(regex='Week').columns
m=df[cols].mean(axis=1).round()
df=df.fillna({x:m for x in cols})
输出:
ID Feature Paid Week1 Week2 Week3 Week4
0 1 1 1 12 12.0 12.0 12.0
1 2 0 1 34 23.0 28.0 28.0
2 3 1 0 24 13.0 14.0 17.0
尝试以下 -
df = pd.DataFrame({"A": [1, 2, np.NaN, 1, 2], "B": [1,2, 3, 4, 5], "C":[np.NaN, 3, 4, np.NaN, 5]})
cols=["A", "C"]
df[cols] = df[cols].fillna(df[cols].mean())
A B C
0 1.0 1 4.0
1 2.0 2 3.0
2 1.5 3 4.0
3 1.0 4 4.0
4 2.0 5 5.0
创建一个字典,将 Week
名称映射到 axis=1
周的 mean
值,然后使用该字典填充 NaN
值
c = df.filter(like='Week').columns
df.fillna(dict.fromkeys(c, df[c].mean(1)))
ID Feature Paid Week1 Week2 Week3 Week4
0 1 1 1 12 12.0 12.0 12.0
1 2 0 1 34 23.0 28.5 28.5
2 3 1 0 24 13.0 14.0 17.0
你可以在这个函数中使用 scikit-learn
中的 SimpleImputer
和 strategy='mean'
你也有 strategy='most_frequent'
并且使用它很容易。(doc)
import numpy as np
from sklearn.impute import SimpleImputer
df = pd.DataFrame({"A": [1, 2, np.NaN, 1, 2], "B": [1,2, 3, 4, 5], "C":[np.NaN, 3, 4, np.NaN, 5]})
imp = SimpleImputer(missing_values=np.nan, strategy='mean')
imp.fit(df)
print(imp.transform(df))
输出:
[[1. 1. 4. ]
[2. 2. 3. ]
[1.5 3. 4. ]
[1. 4. 4. ]
[2. 5. 5. ]]
例如 strategy='most_frequent'
你有:
import numpy as np
from sklearn.impute import SimpleImputer
df = pd.DataFrame({"A": [1, 2, np.NaN, 1, 2], "B": [1,2, 3, 4, 5], "C":[np.NaN, 3, 4, np.NaN, 5]})
imp = SimpleImputer(missing_values=np.nan, strategy='most_frequent')
imp.fit(df)
print(imp.transform(df))
输出:
[[1. 1. 3.]
[2. 2. 3.]
[1. 3. 4.]
[1. 4. 3.]
[2. 5. 5.]]
克服错误的一个简单方法是转置 before/after 填充 Nas:
df.T.fillna(df.filter(like='Week').mean(axis=1)).T.astype(int)
输出:
ID Feature Paid Week1 Week2 Week3 Week4
0 1 1 1 12 12 12 12
1 2 0 1 34 23 28 28
2 3 1 0 24 13 14 17
类似
cols=['Week1','Week2','Week3','Week4']
df[cols] = df[cols].fillna(0) + df[cols].isna().mul(df[cols].mean(axis=1),axis=0)
df
Out[87]:
ID Feature Paid Week1 Week2 Week3 Week4
0 1 1 1 12.0 12.0 12.0 12.0
1 2 0 1 34.0 23.0 28.5 28.5
2 3 1 0 24.0 13.0 14.0 17.0
在下面的 table 中,我只需要为周栏填写 fillna。 NaN 应填充该行中所有周的平均值。
+----+---------+------+-------+-------+-------+-------+
| ID | Feature | Paid | Week1 | Week2 | Week3 | Week4 |
+----+---------+------+-------+-------+-------+-------+
| 1 | 1 | 1 | 12 | NaN | NaN | NaN |
+----+---------+------+-------+-------+-------+-------+
| 2 | 0 | 1 | 34 | 23 | NaN | NaN |
+----+---------+------+-------+-------+-------+-------+
| 3 | 1 | 0 | 24 | 13 | 14 | NaN |
+----+---------+------+-------+-------+-------+-------+
代码
df.fillna(df[[Week1,Week2,Week3,Week4]].mean(axis=1),axis=1,inplace=True)
这给出了一个错误 NotImplementedError: Currently only can fill with dict/Series column by column
您可以尝试通过 filter()
select 列命名为 'Week' 然后找到平均值并将其存储到变量中(以获得良好的性能)最后填充 NaN's
使用 fillna()
:
cols=df.filter(regex='Week').columns
m=df[cols].mean(axis=1).round()
df=df.fillna({x:m for x in cols})
输出:
ID Feature Paid Week1 Week2 Week3 Week4
0 1 1 1 12 12.0 12.0 12.0
1 2 0 1 34 23.0 28.0 28.0
2 3 1 0 24 13.0 14.0 17.0
尝试以下 -
df = pd.DataFrame({"A": [1, 2, np.NaN, 1, 2], "B": [1,2, 3, 4, 5], "C":[np.NaN, 3, 4, np.NaN, 5]})
cols=["A", "C"]
df[cols] = df[cols].fillna(df[cols].mean())
A B C
0 1.0 1 4.0
1 2.0 2 3.0
2 1.5 3 4.0
3 1.0 4 4.0
4 2.0 5 5.0
创建一个字典,将 Week
名称映射到 axis=1
周的 mean
值,然后使用该字典填充 NaN
值
c = df.filter(like='Week').columns
df.fillna(dict.fromkeys(c, df[c].mean(1)))
ID Feature Paid Week1 Week2 Week3 Week4
0 1 1 1 12 12.0 12.0 12.0
1 2 0 1 34 23.0 28.5 28.5
2 3 1 0 24 13.0 14.0 17.0
你可以在这个函数中使用 scikit-learn
中的 SimpleImputer
和 strategy='mean'
你也有 strategy='most_frequent'
并且使用它很容易。(doc)
import numpy as np
from sklearn.impute import SimpleImputer
df = pd.DataFrame({"A": [1, 2, np.NaN, 1, 2], "B": [1,2, 3, 4, 5], "C":[np.NaN, 3, 4, np.NaN, 5]})
imp = SimpleImputer(missing_values=np.nan, strategy='mean')
imp.fit(df)
print(imp.transform(df))
输出:
[[1. 1. 4. ]
[2. 2. 3. ]
[1.5 3. 4. ]
[1. 4. 4. ]
[2. 5. 5. ]]
例如 strategy='most_frequent'
你有:
import numpy as np
from sklearn.impute import SimpleImputer
df = pd.DataFrame({"A": [1, 2, np.NaN, 1, 2], "B": [1,2, 3, 4, 5], "C":[np.NaN, 3, 4, np.NaN, 5]})
imp = SimpleImputer(missing_values=np.nan, strategy='most_frequent')
imp.fit(df)
print(imp.transform(df))
输出:
[[1. 1. 3.]
[2. 2. 3.]
[1. 3. 4.]
[1. 4. 3.]
[2. 5. 5.]]
克服错误的一个简单方法是转置 before/after 填充 Nas:
df.T.fillna(df.filter(like='Week').mean(axis=1)).T.astype(int)
输出:
ID Feature Paid Week1 Week2 Week3 Week4
0 1 1 1 12 12 12 12
1 2 0 1 34 23 28 28
2 3 1 0 24 13 14 17
类似
cols=['Week1','Week2','Week3','Week4']
df[cols] = df[cols].fillna(0) + df[cols].isna().mul(df[cols].mean(axis=1),axis=0)
df
Out[87]:
ID Feature Paid Week1 Week2 Week3 Week4
0 1 1 1 12.0 12.0 12.0 12.0
1 2 0 1 34.0 23.0 28.5 28.5
2 3 1 0 24.0 13.0 14.0 17.0