缺失数据的循环函数
Loop function for missing data
我想用 np.random.normal(mu,s,n)
函数和列表理解方法更改 NaN 值,但我不能。
df_column_values = ["NaN","1","NaN","2","NaN","3","94","4","168","5","NaN"]
n, mu, sigma = 700, 155, 118
array = np.random.normal(mu, sigma, n)
for i in array:
if i > 0 and i < 400:
data['Insulin'].replace(0,(i), inplace=True)
此函数有效,但所有 NaN 值的输出都相同。
我该如何改进此代码?
原始数据来自Kaggle
您似乎想用 (0, 400) 范围内的正态分布随机值替换缺失值。您需要为此使用截断的正态分布。
然后您应该创建一个随机变量向量,其长度与您可能要替换的数据的长度相同。
data = pd.DataFrame({'Insulin': ["NaN","1","NaN","2","NaN","3",
"94","4","168","5","NaN"]})
import scipy.stats as stats
lower, upper = 0, 400
mu, sigma = 155, 118
X = stats.truncnorm(
(lower - mu) / sigma,
(upper - mu) / sigma,
loc=mu, scale=sigma)
data['Insulin'] = np.where(
data['Insulin']=="NaN",
X.rvs(len(data)),
data['Insulin'])
data['Insulin'] = np.where(
data['Insulin'].isna(),
X.rvs(len(data)),
data['Insulin'])
print(data)
Insulin
0 59.069239
1 1
2 113.143013
3 2
4 63.488282
5 3
6 94
7 4
8 168
9 5
10 109.272469
我想用 np.random.normal(mu,s,n)
函数和列表理解方法更改 NaN 值,但我不能。
df_column_values = ["NaN","1","NaN","2","NaN","3","94","4","168","5","NaN"]
n, mu, sigma = 700, 155, 118
array = np.random.normal(mu, sigma, n)
for i in array:
if i > 0 and i < 400:
data['Insulin'].replace(0,(i), inplace=True)
此函数有效,但所有 NaN 值的输出都相同。 我该如何改进此代码?
原始数据来自Kaggle
您似乎想用 (0, 400) 范围内的正态分布随机值替换缺失值。您需要为此使用截断的正态分布。
然后您应该创建一个随机变量向量,其长度与您可能要替换的数据的长度相同。
data = pd.DataFrame({'Insulin': ["NaN","1","NaN","2","NaN","3",
"94","4","168","5","NaN"]})
import scipy.stats as stats
lower, upper = 0, 400
mu, sigma = 155, 118
X = stats.truncnorm(
(lower - mu) / sigma,
(upper - mu) / sigma,
loc=mu, scale=sigma)
data['Insulin'] = np.where(
data['Insulin']=="NaN",
X.rvs(len(data)),
data['Insulin'])
data['Insulin'] = np.where(
data['Insulin'].isna(),
X.rvs(len(data)),
data['Insulin'])
print(data)
Insulin
0 59.069239
1 1
2 113.143013
3 2
4 63.488282
5 3
6 94
7 4
8 168
9 5
10 109.272469