基于 pandas 回归的缺失值插补

Question

我想输入基于多元插补的缺失数据，在下面附上的数据集中，A列有一些缺失值，A列和B列的相关系数为0.70。所以我想使用一种回归关系，这样它将建立 A 列和 B 列之间的关系，并估算 Python.

中的缺失值

N.B.: 我可以使用均值、中位数和众数来完成，但我想使用另一列的关系来填充缺失值。

如何处理问题。请提供您的解决方案

import pandas as pd
from sklearn.preprocessing import Imputer
import numpy as np
  

    # assign data of lists.  
    data = {'Date': ['9/19/14', '9/20/14', '9/21/14', '9/21/14','9/19/14', '9/20/14', '9/21/14', '9/21/14','9/19/14', '9/20/14', '9/21/14', '9/21/14', '9/21/14'], 
            'A': [77.13, 39.58, 33.70, np.nan, np.nan,39.66, 64.625, 80.04, np.nan ,np.nan ,19.43, 54.375, 38.41],
            'B': [19.5, 21.61, 22.25, 25.05, 24.20, 23.55, 5.70, 2.675, 2.05,4.06, -0.80, 0.45, -0.90],
            'C':['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'c']}  
      
    # Create DataFrame  
    df = pd.DataFrame(data)  
    df["Date"]= pd.to_datetime(df["Date"]) 
    # Print the output.  
    print(df)

Answer 1

使用：

dfreg = df[df['A'].notna()]
dfimp = df[df['A'].isna()]

from sklearn.neural_network import MLPRegressor    
regr = MLPRegressor(random_state=1, max_iter=200).fit(dfreg['B'].values.reshape(-1, 1), dfreg['A'])
regr.score(dfreg['B'].values.reshape(-1, 1), dfreg['A'])

regr.predict(dfimp['B'].values.reshape(-1, 1))

请注意，在提供的数据中，A 列和 B 列的相关性非常低（小于 .05）。用空单元格替换估算值：

s = df[df['A'].isna()]['A'].index
df.loc[s, 'A'] = regr.score(dfreg['B'].values.reshape(-1, 1), dfreg['A'])

输出：

基于 pandas 回归的缺失值插补

Missing value Imputation based on regression in pandas

python

missing-data

pandas